Predicting inter-thread cache contention on a chip multi-processor architecture

被引:168
|
作者
Chandra, D [1 ]
Guo, F [1 ]
Kim, S [1 ]
Solihin, Y [1 ]
机构
[1] N Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA
关键词
D O I
10.1109/HPCA.2005.27
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the impact of L2 cache sharing on threads that simultaneously share the cache, on a Chip Multi-Processor (CMP) architecture. Cache sharing impacts threads non-uniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as sub-optimal throughput, cache thrashing, and thread starvation for threads that fail to occupy sufficient cache space to make good progress. Unfortunately, there is no existing model that allows extensive investigation of the impact of cache sharing. To allow such a study, we propose three performance models that predict the impact of cache sharing on co-scheduled threads. The input to our models is the isolated L2 cache stack distance or circular sequence profile of each thread, which can be easily obtained on-line or off-line. The output of the models is the number of extra L2 cache misses for each thread due to cache sharing. The models differ by their complexity and prediction accuracy. We validate the models against a cycle-accurate simulation that implements a dual-core CMP architecture, on fourteen pairs of mostly SPEC benchmarks. The most accurate model, the Inductive Probability model, achieves an average error of only 3.9%. Finally, to demonstrate the usefulness and practicality of the model, a case study that details the relationship between an application's temporal reuse behavior and its cache sharing impact is presented.
引用
收藏
页码:340 / 351
页数:12
相关论文
共 50 条
  • [1] A 16-port data cache for chip multi-processor architecture
    Jing, Wang
    Xiaoya, Fan
    Hai, Wang
    Ming, Yang
    ICEMI 2007: PROCEEDINGS OF 2007 8TH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOL III, 2007, : 183 - 186
  • [2] Node predicting based direct cache coherence protocol for chip multi-processor
    Zhang, Jun
    Tian, Ze
    Mei, Kui-Zhi
    Zhao, Ji-Zhong
    Jisuanji Xuebao/Chinese Journal of Computers, 2014, 37 (03): : 700 - 720
  • [3] Research on Cache Access Equalization in Chip Multi-Processor
    Wang Z.-C.
    Chen X.-W.
    Guo Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (11): : 2403 - 2416
  • [4] Reliability aware throughput management of chip multi-processor architecture via thread migration
    Pouyan, Fatemeh
    Azarpeyvand, Ali
    Safari, Saeed
    Fakhraie, Sied Mehdi
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (04): : 1363 - 1380
  • [5] Reliability aware throughput management of chip multi-processor architecture via thread migration
    Fatemeh Pouyan
    Ali Azarpeyvand
    Saeed Safari
    Sied Mehdi Fakhraie
    The Journal of Supercomputing, 2016, 72 : 1363 - 1380
  • [6] Helper Thread Prefetching Control Framework on Chip Multi-processor
    Zhang, Jianxun
    Gu, Zhimin
    Huang, Yan
    Zheng, Ninghan
    Hu, Xiaohan
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2015, 43 (02) : 180 - 202
  • [7] Helper Thread Prefetching Control Framework on Chip Multi-processor
    Jianxun Zhang
    Zhimin Gu
    Yan Huang
    Ninghan Zheng
    Xiaohan Hu
    International Journal of Parallel Programming, 2015, 43 : 180 - 202
  • [8] Bringing Inter-Thread Cache Benefits to Federated Scheduling
    Tessler, Corey
    Modekurthy, Venkata P.
    Fisher, Nathan
    Saifullah, Abusayeed
    2020 IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM (RTAS 2020), 2020, : 281 - 295
  • [9] DITVA: Dynamic Inter-Thread Vectorization Architecture
    Kalathingal, Sajith
    Collange, Sylvain
    Swamy, Bharath N.
    Seznec, Andre
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 120 : 267 - 281
  • [10] On Chip Cache Quantitative Optimization Approach: Study in Chip Multi-processor Design
    Zhang, Chi
    Wang, Xiang
    HIGH PERFORMANCE COMPUTING AND APPLICATIONS, 2010, 5938 : 550 - 556