The interactive file manager requires Javascript. Please enable it or use sftp or scp.
You may still browse the files here.
| Name | Modified | Size | Downloads / Week |
|---|---|---|---|
| Parent folder | |||
| OpenBLAS 0.3.4 version.tar.gz | 2018-12-02 | 11.8 MB | |
| OpenBLAS 0.3.4 version.zip | 2018-12-02 | 24.3 MB | |
| README.md | 2018-12-02 | 3.5 kB | |
| Totals: 3 Items | 36.1 MB | 0 | |
OpenBLAS 0.3.4 version
common:
- the new, experimental thread-local memory allocation had inadvertently been left enabled for gmake builds in 0.3.3 despite the announcement. It is now disabled by default, and single-threaded builds will keep using the old allocator even if the USE_TLS option is turned on.
- OpenBLAS will now provide enough buffer space for at least 50 threads by default.
- The output of openblas_get_config() now contains the version number.
- A serious thread safety bug in GEMV operation with small M and large N size has been fixed.
- The code will now automatically call blas_thread_init after a fork if needed before handling a call to openblas_set_num_threads
- Accesses to parallelized level3 functions from multiple callers are now serialized to avoid thread races (unless using OpenMP). This should provide better performance than the known-threadsafe (but non-default) USE_SIMPLE_THREADED_LEVEL3 option.
- When building LAPACK with gfortran, -frecursive is now (again) enabled by default to ensure correct behaviour.
- The OpenBLAS version cblas.h now supports both CBLAS_ORDER and CBLAS_LAYOUT as the name of the matrix row/column order option.
- Externally set LDFLAGS are now passed through to the final compile/link steps to facilitate setting platform-specific linker flags.
- A potential race condition during the build of LAPACK (that would usually manifest itself as a failure to build TESTING/MATGEN) has been fixed.
- xHEMV has been changed to stay single-threaded for small input sizes where the overhead of multithreading exceeds any possible gains
- CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or ThunderX hardware with sizable input.
- Linker flags for the PGI compiler have been updated
- Behaviour of AXPY with zero increments is now handled in the C interface, correcting the result on at least Intel Atom.
- The result matrix from calling SGELSS with an all-zero input matrix is now zeroed completely.
x86_64:
- Autodetection of AMD Ryzen2 has been fixed (again).
- CMAKE builds now support labeling of an INTERFACE64=1 build of the library with the _64 suffix.
- AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel has been sped up by rewriting with C intrinsics
- Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS)
POWER:
- added support for building on AIX (with gcc and GNU tools from AIX Toolbox).
- CPU type detection has been implemented for AIX.
- CPU type detection has been fixed for NETBSD.
MIPS64:
- AXPY on LOONGSON3A has been corrected to pass "zero increment" utest.
- DSDOT on LOONGSON3A has been fixed.
- the SGEMM microkernel has been hardened against potential data loss.
ARMV8:
- DYNAMic_ARCH support is now available for 64bit ARM
- cross-compiling for ARMV8 under iOS now works.
- cpu-specific code has been rearranged to make better use of both hardware commonalities and model-specific compiler optimizations.
- XGENE1 has been removed as a TARGET, superseded by the improved generic ARMV8 support.
ARMV7:
- Older assembly mnemonics have been converted to UAL form to allow building with clang 7.0
- Cross compiling LAPACKE for Android has been fixed again (broken by update to LAPACK 3.7.0 some while ago).