Improve Gebp kernel for Arm Neon float.
Submitted by Renjie Liu
Assigned to Nobody
Link to original bugzilla bug (#1624)
Operating system: Android
Description
Created attachment 892
proposed optimization patch
Currently, the loadRhs operation will load a single value and duplicate it for all lanes, then madd will perform
a vector multiply-add a vector, this is unnecessary for Neon since vfmaq_lane can handle vector multiply-add
a single value.
On a Arm Neon device, SDM 845 (Pixel 3), this can give us a ~20% performance boosting.
Attachment 892, "proposed optimization patch":
patch.diff
Edited by Eigen Bugzilla