Doctor of Philosophy (PhD)
Electrical and Computer Engineering
General purpose processors were once designed with the major goal of maximizing performance. As power consumption has grown, with the advent of multi-core processors and the rising importance of embedded and mobile devices, the importance of designing efficient and low cost architectures has increased. This dissertation focuses on reducing the complexity of the front end of the processor, mainly branch predictors. Branch predictors have also been designed with a focus on improving prediction accuracy so that performance is maximized. To accomplish this, the predictors proposed in the literature and used in real systems have become increasingly complex and larger, a trend that is inconsistent with the anticipated trend of simpler and more numerous cores in future processors. Much of the increased complexity in many recently proposed predictors is used to select a part of history most correlated to a branch. This makes them costly, if not impossible to implement practically. We suggest that the complex decisions do not have to be made in hardware at prediction or run time and can be moved offline. High accuracy can be achieved by making complex prediction decisions in a one-time profile run instead of using complex hardware. We apply these techniques to Spotlight, our own low cost, low complexity branch predictor. A static analysis step determines, for each branch, the history segment yielding the highest accuracy. This information is placed in unused instruction space. Spotlight achieves higher accuracy than other implementation-simple predictors such as Gshare and YAGS and matches or outperforms the two complex neural predictors that we compare it to. To ensure timely access, we evaluate using a hardware table (called a BIT) to store profile bits after they are extracted from instructions, and the accuracy of using this table. The drawback of a BIT is its size. We introduce a novel technique, Preloading that places data for an instruction in prior blocks on the path to the instruction. By doing so, it is able to significantly reduce the size of the BIT needed for good performance. We discuss other applications of Preloading on the front end other than branch predictors.
Document Availability at the Time of Submission
Release the entire work immediately for access worldwide.
Verma, Santhosh, "Reducing complexity of processor front ends with static analysis and selective preloading" (2011). LSU Doctoral Dissertations. 2622.