Now records timings for each individual GPU
Fixed parbonded to work correctly with multiple GPUs
Fixed a bug with stmv test case
Added timers for bonded sections
Fixed stmv.c
GPU code updated with task timers and 3 versions
Working version of GPU code
use cell-wise maxdx.
fix bug in bonded sets.