The interaction and relative effectiveness of hardware and software data prefetch

Document Type

Conference Proceeding

Publication Date



A major performance limiter in modern processors is the long latencies caused by data cache misses. Both compiler- and hardware-based prefetching schemes help hide these latencies and so improve performance. Compiler techniques infer memory access patterns through code analysis, and insert appropriate prefetch instructions. Hardware prefetching techniques work independently from the compiler by monitoring an access stream, detecting patterns in this stream and issuing prefetches based on these patterns. This paper looks at the interplay between compiler and hardware architecture-based prefetching techniques. Does either technique make the other one unnecessary? First, compilers' ability to achieve good results without extreme expertise is evaluated by preparing binaries with no prefetch, one-flag prefetch (no tuning), and expertly tuned prefetch. From runs of SPECcpu2006 binaries, we find that expertise avoids minor slowdown in a few benchmarks and provides substantial speedup in others. We compare software schemes to hardware prefetching schemes and our simulations show software alone substantially outperforms hardware alone on about half of a selection of benchmarks. While hardware matches or exceeds software in a few cases, software is better on average. Analysis reveals that in many cases hardware is not prefetching access patterns that it is capable of recognizing, due to some irregularities in the observed miss sequence. Hardware outperforms software on address sequences that the compiler would not guess. In general, while software is better at prefetching individual loads, hardware partly compensates for this by identifying more loads to prefetch. Using the two schemes together provides further benefits, but less than the sum of the contributions of each alone. © 2012 World Scientific Publishing Company.

Publication Source (Journal or Book title)

Journal of Circuits, Systems and Computers

This document is currently not available here.