Handling ldexp over/underflows
Overflows and underflows in the current implementations of pldexp are producing garbage and are causing issues with other ops, specifically the new vectorized pow.
In our current implementations of pldexp(a, e), we do the following (using float as an example)
- Construct c = 2^e via (
(e+127)<<23) - Multiply a * c
Step 1) is leading to bad results when e < -127 or e > 128. For example,
-
pldexp(1, -200)returns-7.20576e+16instead of zero -
pldexp(1, 200)returns-1.38778e-17instead ofinf
Step 2) is leading to loss of accuracy if a is small and c is big (or vice versa).
We could clamp the exponent to the valid range [-127, 128]. This will fix the garbage results, but we still lose accuracy compared to std::ldexp. For example,
-
std::ldexp(2^-100, 200)returns 2^100, but the clamping approach would produceinf.
I do have a potential implementation (!375 (closed)) to preserve accuracy, but it's rather complicated and more difficult to apply to SSE/AVX/AVX512 which currently all have custom specializations for double.
Also, the generic solution is about 7x slower than the current one. The clamped implementation is 1.3x slower. We do have cases of pldexp where we know the exponent is in the valid range, and it would be nice to use the faster version for those.
@rmlarsen1 @chhtz @tellenbach any thoughts on the best way forwards? Do we introduce a pldexp_safe and pldexp_unsafe? Should we go with the clamped version, take the 30% speed hit and the loss of accuracy?