[go: up one dir, main page]

Add workaround for using std::fma for scalar multiply-add.

This is mainly to provide backward-compatibility. The new macro should not be used in new usages (or generally if avoidable).

Background: Eigen introduced then removed several uses of std::fma for scalar multiply-add operations. It was added to increase precision and boost performance on systems that support FMA in hardware. But it turned out to significantly slow down multiply-adds on systems that do not: 2-3x for intel CPUs, and 30x for WASM builds (#2959 (closed)). We then limited the usage to only cases where hardware FMA is available. This ensures consistency between vectorized and non-vectorized paths, and keeps the higher precision only when it will not affect performance.

Unfortunately, several projects seem to rely on the intermediate behavior in new tests where std::fma is used but do not build with FMA hardware instructions available. These now break.

To ease the transition, we introduce this temporary flag.

Merge request reports

Loading