// This is like generic, but written explicitly
// because generic SIMD requires nightly.
// Modern x86 machines can do lots of fun stuff;
// this is where the *real* optimizations go.
// Runtime feature detection is not available with no_std.
/// Modern ARM machines are also quite capable thanks to NEON