Modeling Cache Contention and Throughput of Multiprogrammed Manycore Processors

被引:13
作者
Chen, Xi E. [1 ]
Aamodt, Tor M. [2 ]
机构
[1] NVIDIA Corp, Beaverton, OR 97006 USA
[2] Univ British Columbia, Dept Elect & Comp Engn, Vancouver, BC V6T 1Z4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Analytical modeling; cache contention; manycore; fine-grained multithreading; throughput; PERFORMANCE;
D O I
10.1109/TC.2011.141
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes an analytical model for accurately predicting the impact of contention on cache miss rates. The focus is multiprogrammed workloads running on multithreaded manycore architectures. This work addresses a key challenge facing earlier cache contention models as the number of concurrent threads exceeds the associativity of shared caches. The memory access characteristics of individual applications are obtained in isolation by profiling their circular sequences and two new measures of access locality are proposed. An evaluation of this model in the context of a Niagara processor shows that it achieves an average 8.7 percent error in miss rate predictions which improves upon the best prior model by 48.1x. This paper also presents a novel Markov chain throughput model. When combining the contention model with the Markov chain model, throughput is estimated with an average error of 8.3 percent compared to detailed simulation. Moreover, the combined model tracks throughput sufficiently well to find the same optimized design point for application-specific workloads 65 times faster than detailed simulation. This paper also shows that the models accurately predict cache contention and throughput trends across various workloads on real hardware.
引用
收藏
页码:913 / 927
页数:15
相关论文
共 29 条
[1]   Parallel program performance prediction using deterministic task graph analysis [J].
Adve, VS ;
Vernon, MK .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2004, 22 (01) :94-136
[2]   AN ANALYTICAL CACHE MODEL [J].
AGARWAL, A ;
HOROWITZ, M ;
HENNESSY, J .
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1989, 7 (02) :184-215
[3]  
[Anonymous], 2007, OPENSPARC T2 COR MIC
[4]  
Berg E., 2005, Performance Evaluation Review, V33, P169, DOI 10.1145/1071690.1064232
[5]   Predicting inter-thread cache contention on a chip multi-processor architecture [J].
Chandra, D ;
Guo, F ;
Kim, S ;
Solihin, Y .
11TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2005, :340-351
[6]  
Chen XE, 2009, INT S HIGH PERF COMP, P329, DOI 10.1109/HPCA.2009.4798270
[7]  
Cmelik B., 1994, Performance Evaluation Review, V22, P128, DOI 10.1145/183019.183032
[8]   System-level performance metrics for multiprogram workloads [J].
Eyerman, Stijn ;
Eeckhout, Lieven .
IEEE MICRO, 2008, 28 (03) :42-53
[9]  
Falsafi B., 1997, ACM Transactions on Modeling and Computer Simulation, V7, P104, DOI 10.1145/244804.244808
[10]  
Fedorova Alexandra, 2007, 2007 16th International Conference on Parallel Architectures and Compilation Techniques, P25