Adhere to recommended load/store intrinsics for pp64le
This Merge Request updates Altivec's PacketMath.h to use vec_xl and vec_xst which are the recommended intrinsics to load/store data on the VSU. This change enhances around 7% performance of matrix multiplication on matrices of about 1024x1024.