Decoupled software pipelining with the synchronization array

被引:59
作者
Rangan, R [1 ]
Vachharajani, N [1 ]
Vachharajani, M [1 ]
August, DI [1 ]
机构
[1] Princeton Univ, Dept Comp Sci, Princeton, NJ 08544 USA
来源
13TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES, PROCEEDINGS | 2004年
关键词
D O I
10.1109/PACT.2004.1342552
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Despite the success of instruction-level parallelism (ILP) optimizations in increasing the performance of microprocessors, certain codes retrain elusive. In particular, codes containing recursive data structure (RDS) traversal loops have been largely immune to ILP optimizations, due to the fundamental serialization and variable latency of the loop-carried dependence through a pointer-chasing load. To address these and other situations, we introduce decoupled software pipelining (DSWP), a technique that statically splits a single-threaded sequential loop into multiple non-speculative threads, each of which performs useful computation essential for overall program correctness. The resulting threads execute on thread-parallel architectures such as simultaneous multithreaded (SMT) cores or chip multiprocessors (CMP), expose additional instruction level parallelism, and tolerate latency better than the original single-threaded RDS loop. To reduce overhead, these threads communicate using a synchronization array, a dedicated hardware structure for pipelined inter-thread communication. DSWP used in conjunction with the synchronization array achieves an 11% to 76% speedup in the optimized functions on both statically and dynamically scheduled processors.
引用
收藏
页码:177 / 188
页数:12
相关论文
共 25 条
[1]  
BARNES RD, 2003, P 36 INT S MICR DEC
[2]   Hierarchical scheduling windows [J].
Brekelbaum, E ;
Rupley, J ;
Wilkerson, C ;
Black, B .
35TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-35), PROCEEDINGS, 2002, :27-36
[3]  
Cintra M, 2000, PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P13, DOI [10.1145/342001.363382, 10.1109/ISCA.2000.854373]
[4]  
COLLINS J, 2001, P 28 INT S COMP ARCH
[5]   Dynamic speculative precomputation [J].
Collins, JD ;
Tullsen, DM ;
Wang, H ;
Shen, JP .
34TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO-34, PROCEEDINGS, 2001, :306-317
[6]   BEYOND INDUCTION VARIABLES - DETECTING AND CLASSIFYING SEQUENCES USING A DEMAND-DRIVEN SSA FORM [J].
GERLEK, MP ;
STOLTZ, E ;
WOLFE, M .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1995, 17 (01) :85-122
[7]  
HAMMOND L, 1997, IEEE COMPUTER SEP
[8]  
HENNESSY JL, 1991, COMPUTER SEP, P18
[9]  
Luk C., 1996, ACM SIGOPS Operating Systems Review, V30, P222, DOI DOI 10.1145/237090.237190
[10]  
Luk Chi-Keung, 2001, P 28 INT S COMP ARCH