Rawat et al., 2016 - Google Patents

Resource conscious reuse-driven tiling for GPUs

Rawat et al., 2016

Document ID: 13566322152352172489
Author: Rawat P; Hong C; Ravishankar M; Grover V; Pouchet L; Rountev A; Sadayappan P
Publication year: 2016
Publication venue: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

External Links

Cited by

Snippet

Computations involving successive application of 3D stencil operators are widely used in many application domains, such as image processing, computational electromagnetics, seismic processing, and climate modeling. Enhancement of temporal and spatial locality via …

Continue reading at dl.acm.org (PDF) (other versions)

230000002123 temporal effect 0 abstract description 4

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/45—Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions
- G06F8/456—Parallelism detection
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/445—Exploiting fine grain parallelism, i.e. parallelism at instruction level
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3457—Performance evaluation by simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/78—Power analysis and optimization

Similar Documents

Publication	Publication Date	Title
Rawat et al.	2016	Resource conscious reuse-driven tiling for GPUs
Wahib et al.	2014	Scalable kernel fusion for memory-bound GPU applications
Ardalani et al.	2015	Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
Mendez-Lojo et al.	2012	A GPU implementation of inclusion-based points-to analysis
US8930926B2 (en)	2015-01-06	System, methods and apparatus for program optimization for multi-threaded processor architectures
US8572590B2 (en)	2013-10-29	Methods and apparatus for joint parallelism and locality optimization in source code compilation
Luporini et al.	2015	Cross-loop optimization of arithmetic intensity for finite element local assembly
Zhang et al.	2010	Variable partitioning and scheduling for MPSoC with virtually shared scratch pad memory
Cattaneo et al.	2015	On how to accelerate iterative stencil loops: a scalable streaming-based approach
Soliman et al.	2017	WCET-driven dynamic data scratchpad management with compiler-directed prefetching
Sioutas et al.	2020	Schedule synthesis for halide pipelines on gpus
Sampaio et al.	2014	Divergence analysis
Sabne et al.	2013	Scaling large-data computations on multi-GPU accelerators
Domagała et al.	2016	Register allocation and promotion through combined instruction scheduling and loop unrolling
Oancea et al.	2012	Financial software on GPUs: between Haskell and Fortran
Pananilath et al.	2015	An optimizing code generator for a class of lattice-boltzmann computations
Spector et al.	2024	Thunderkittens: Simple, fast, and adorable ai kernels
Stawinoga et al.	2018	Predictable thread coarsening
Abdelaal et al.	2021	Tile size selection of affine programs for GPGPUs using polyhedral cross-compilation
Selva et al.	2019	Building a polyhedral representation from an instrumented execution: Making dynamic analyses of nonaffine programs scalable
Kim et al.	2024	Optimal Model Partitioning with Low-Overhead Profiling on the PIM-based Platform for Deep Learning Inference
WO2010121228A2 (en)	2010-10-21	System, methods and apparatus for program optimization for multi-threaded processor architectures
Sharma et al.	2015	Data layout optimization for portable performance
Baradaran et al.	2008	A compiler approach to managing storage and memory bandwidth in configurable architectures
Kim et al.	2006	Exploiting reference idempotency to reduce speculative storage overflow