Controlling the power and area of neural branch predictors for practical implementation in high-performance processors

被引：0

作者：

Jimenez, Daniel A. ^{[1
]}

Loh, Gabriel H. ^{[1
]}

机构：

[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA

来源：

SBAC-OAD 2006: 18TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING | 2006年

关键词：

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Neural-inspired branch predictors achieve very low branch misprediction rates. However, previously proposed implementations have a variety of characteristics that make them challenging to implement in future high-performance processors. In particular the the original Perceptron branch predictor suffers from a long access latency, and the faster path-based neural predictor (PBNP) requires deep pipelining and additional area to support checkpointing for misprediction recovery. The complexity of the PBNP predictor stems from the fact that the path history length, which determines the number of tables and pipeline stages, is equal to the history length, which is typically very long for high accuracy. We propose to decouple the path-history length from the outcome-history length through a new technique called modulo-path history. By allowing a shorter path history, we can implement a PBNP with significantly fewer tables and pipeline stages while still exploiting a traditional long branch outcome history. The pipeline length reduction results in decreased power and implementation complexity. We also propose folded modulo-path history to allow the number of pipeline stages to differ from the path history length. We show that our modulo-path PBNP at 8KB can achieve prediction accuracy and overall performance within 0.8% (SPECint) of the original PBNP while simultaneously reducing predictor energy consumption by similar to 29% per access and predictor die area by similar to 35%. Our folded modulo-path history PBNP achieves performance within 1.3% of ideal, with a similar to 37% energy reduction and similar to 36% predictor area reduction.

引用

页码：55 / +

页数：2

共 25 条

[1]

[Anonymous], 1991, P 24 ACM IEEE INT S

[2]

[Anonymous], 1993, COMBINING BRANCH PRE

[3]

[Anonymous], 2005, Journal of Instruction Level Parallelism

[4]

[Anonymous], IEEE COMPUT

[5]

Co M., 2005, P WORKSH COMPL EFF D

[6]

DESMET V, 2004, P 1 CHAMP BRANCH PRE, P1

[7]

Gao H., 2004, P 1 CHAMP BRANCH PRE, P1

[8]

Ghose K., 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477), P70, DOI 10.1109/LPE.1999.799412

[9]

GOCHMAN S, 2003, INTEL TECHOLOGY J, V7

[10]

GUTHAUS MR, 2001, P 4 WORKSH WORKL CHA, P83

← 1 2 3 →