Efficient Methods for Out-of-Order Load/Store Execution for High-Performance Soft Processors

被引:0
作者
Wong, Henry [1 ]
Betz, Vaughn [1 ]
Rose, Jonathan [1 ]
机构
[1] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON M5S 1A1, Canada
来源
PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT) | 2013年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As FPGAs continue to increase in size, it becomes increasingly feasible and desirable to build higher performance soft processors. Preserving the familiar single-threaded programming model can be done with an out of order processor. The ability to execute memory loads and stores out of order has a large impact on performance, but this is difficult to do because the dependencies between stores and loads are not known until addresses are computed. Out of order memory disambiguation is traditionally done with CAMs in the load queue and store queue, but large CAMs are inefficient on FPGAs. Store Queue Index Prediction (SQIP) and NoSQ propose to replace CAMs with store-load forwarding prediction and load re-execution. We implement four memory disambiguation schemes (in-order, CAM, SQIP, NoSQ) on a Stratix IV FPGA and evaluate the area and delay trade-offs. We find that CAM area and delay degrade quickly with load/store queue size, while SQIP and NoSQ have little degradation with queue size but have area overhead for prediction and predictor training hardware. SQIP and NoSQ use less area than CAMs beyond 32 and 16 load/store queue entries, respectively, and have higher maximum frequency beyond 4 entries.
引用
收藏
页码:442 / 445
页数:4
相关论文
共 13 条
[1]  
Cain HW, 2004, CONF PROC INT SYMP C, P90
[2]  
JONES A.K., 2005, FPGA 05, P107
[3]   Improving pipelined soft processors with multithreading [J].
Labrecque, Martin ;
Steffan, J. Gregory .
2007 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 2007, :210-215
[4]  
LaForest CE, 2012, FPGA 12: PROCEEDINGS OF THE 2012 ACM-SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, P219
[5]  
LAWTON KP, 1996, LINUX J, V1996
[6]   Speculative memory cloaking and bypassing [J].
Moshovos, A ;
Sohi, GS .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 1999, 27 (06) :427-456
[7]  
Moshovos A, 1997, ACM COMP AR, P181, DOI 10.1145/384286.264189
[8]  
Moshovos A., 2000, P 6 INT S HIGH PERF, P301
[9]  
Roth A., 2006, JILP, V8
[10]  
Severance A, 2012, 2012 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT'12), P261, DOI 10.1109/FPT.2012.6412146