Add workaround for using std::fma for scalar multiply-add.
This is mainly to provide backward-compatibility. The new macro should not be used in new usages (or generally if avoidable).
Background: Eigen introduced then removed several uses of
std::fma for scalar multiply-add operations. It was added
to increase precision and boost performance on systems that
support FMA in hardware. But it turned out to significantly
slow down multiply-adds on systems that do not: 2-3x for intel CPUs,
and 30x for WASM builds (#2959 (closed)). We then limited the usage to only cases
where hardware FMA is available. This ensures consistency
between vectorized and non-vectorized paths, and keeps the
higher precision only when it will not affect performance.
Unfortunately, several projects seem to rely on the intermediate
behavior in new tests where std::fma is used but do not
build with FMA hardware instructions available. These now break.
To ease the transition, we introduce this temporary flag.