Document Type
Conference Proceeding
Publication Date
7-11-2013
Abstract
Stencil computations are an integral component of applications in a number of scientific computing domains. Short-vector SIMD instruction sets are ubiquitous on modern processors and can be used to significantly increase the performance of stencil computations. Traditional approaches to optimizing stencils on these platforms have focused on either short-vector SIMD or data locality optimizations. In this paper, we propose a domain specific language and compiler for stencil computations that allows specification of stencils in a concise manner and automates both locality and short-vector SIMD optimizations, along with effective utilization of multi-core parallelism. Loop transformations to enhance data locality and enable load-balanced parallelism are combined with a data layout transformation to effectively increase the performance of stencil computations. Performance increases are demonstrated for a number of stencils on several modern SIMD architectures. © 2013 ACM.
Publication Source (Journal or Book title)
Proceedings of the International Conference on Supercomputing
First Page
13
Last Page
24
Recommended Citation
Henretty, T., Veras, R., Franchetti, F., Pouchet, L., Ramanujam, J., & Sadayappan, P. (2013). A stencil compiler for short-vector SIMD architectures. Proceedings of the International Conference on Supercomputing, 13-24. https://doi.org/10.1145/2464996.2467268