SMTp: An architecture for next-generation scalable multi-threading

被引:5
作者
Chaudhuri, M [1 ]
Heinrich, M [1 ]
机构
[1] Cornell Univ, Comp Syst Lab, Ithaca, NY 14853 USA
来源
31ST ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS | 2004年
关键词
D O I
10.1109/ISCA.2004.1310769
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We introduce the SMTp architecture-an SMT processor augmented with a coherence protocol thread context, that together with a standard integrated memory controller can enable the design of (among other possibilities) scalable cache-coherent hardware distributed shared memory (DSM) machines from commodity nodes. We describe the minor changes needed to a conventional out-of-order multithreaded core to realize SMTp, discussing issues related to both deadlock avoidance and performance. We then compare SMTp performance to that of various conventional DSM machines with normal SMT processors both with and without integrated memory controllers. On configurations from I to 32 nodes, with I to 4 application threads per node, we find that SMTp delivers performance comparable to, and sometimes better than, machines with more complex integrated DSM-specific memory controllers. Our results also show that the protocol thread has extremely low pipeline overhead. Given the simplicity and the flexibility of the SMTp mechanism, we argue that next-generation multithreaded processors with integrated memory controllers should adopt this mechanism as a way of building less complex high-performance DSM multiprocessors.
引用
收藏
页码:124 / 135
页数:12
相关论文
共 45 条
[1]   Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance [J].
Abandah, GA ;
Davidson, ES .
25TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 1998, :318-329
[2]  
AGARWAL A, 1995, ACM COMP AR, P2, DOI 10.1109/ISCA.1995.524544
[3]  
[Anonymous], P 24 INT S COMP ARCH
[4]  
Barroso LA, 2000, PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P282, DOI [10.1109/ISCA.2000.854398, 10.1145/342001.339696]
[5]   Difficult-path branch prediction using subordinate microthreads [J].
Chappell, RS ;
Tseng, F ;
Yoaz, A ;
Patt, YN .
29TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2002, :307-317
[6]   Dynamic speculative precomputation [J].
Collins, JD ;
Tullsen, DM ;
Wang, H ;
Shen, JP .
34TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO-34, PROCEEDINGS, 2001, :306-317
[7]  
Culler DavidE., 1999, PARALLEL COMPUTER AR
[8]   Performance analysis of the alpha 21364-based HP GS1280 multiprocessor [J].
Cvetanovic, Z .
30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, :218-228
[9]  
Frigo M, 1998, INT CONF ACOUST SPEE, P1381, DOI 10.1109/ICASSP.1998.681704
[10]   Spider: A high-speed network interconnect [J].
Galles, M .
IEEE MICRO, 1997, 17 (01) :34-39