Adaptive parallel tiled code generation and accelerated auto-tuning
Document Type
Conference Proceeding
Publication Date
11-1-2013
Abstract
Tiling is an important program transformation that is often used to enhance cache locality and to obtain coarse-grained parallelism. In this paper, we address the problem of generating adaptive parametric tiled code for parallel execution contexts; in other words, generating parallel tiled code in which tile sizes can be changed on the fly during execution. Changing of tile sizes during pipelined parallel execution of tiles presents the following fundamental code-generation challenge: the unscanned iteration space may become non-convex. We develop novel solutions for the adaptive parallel tiled code generation problem. Using adaptive tiling, auto-tuning for tile size selection can be accelerated: in a single run of the tiled code, several tile sizes may be tested for their performance and thus expedite auto-tuning. Adaptive tiling is also useful in scenarios where tile sizes need to be dynamically altered to tailor to the changing execution environments, such as dynamically resized caches for power savings. Experimental evaluation on a number of benchmarks demonstrates the effectiveness of the developed approach. © The Author(s) 2013.
Publication Source (Journal or Book title)
International Journal of High Performance Computing Applications
First Page
412
Last Page
425
Recommended Citation
Tavarageri, S., Ramanujam, J., & Sadayappan, P. (2013). Adaptive parallel tiled code generation and accelerated auto-tuning. International Journal of High Performance Computing Applications, 27 (4), 412-425. https://doi.org/10.1177/1094342013493939