Functions that work over warps of threads.
Warps in CUDA are groups of 32 threads that are dispatched together inside of thread blocks and execute in SIMT fashion.
Returns the thread’s lane within its warp. This value ranges from 0 to WARP_SIZE - 1 (WARP_SIZE is 32 on all architectures currently).
0
WARP_SIZE - 1
WARP_SIZE
Synchronizes all of the threads inside of this warp according to mask.
mask