Document Type
Conference Proceeding
Publication Date
4-6-2011
Abstract
Stencil computations are at the core of applications in many domains such as computational electromagnetics, image processing, and partial differential equation solvers used in a variety of scientific and engineering applications. Short-vector SIMD instruction sets such as SSE and VMX provide a promising and widely available avenue for enhancing performance on modern processors. However a fundamental memory stream alignment issue limits achieved performance with stencil computations on modern short SIMD architectures. In this paper, we propose a novel data layout transformation that avoids the stream alignment conflict, along with a static analysis technique for determining where this transformation is applicable. Significant performance increases are demonstrated for a variety of stencil codes on three modern SIMD-capable processors. © 2011 Springer-Verlag.
Publication Source (Journal or Book title)
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
First Page
225
Last Page
245
Recommended Citation
Henretty, T., Stock, K., Pouchet, L., Franchetti, F., Ramanujam, J., & Sadayappan, P. (2011). Data layout transformation for stencil computations on short-vector SIMD architectures. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6601 LNCS, 225-245. https://doi.org/10.1007/978-3-642-19861-8_13