Thread Criticality Assisted Replication and Migration for Chip Multiprocessor Caches

被引:2
|
作者
Li, Jianhua [1 ]
Li, Minming [2 ]
Xue, Chun Jason [2 ]
Ouyang, Yiming [1 ]
Shen, Fanfan [3 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat, Hefei 230001, Anhui, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China
[3] Wuhan Univ, Comp Sch, Wuhan 430072, Hubei, Peoples R China
关键词
Chip multiprocessor; non-uniform cache; thread criticality; replication; migration; WIRE DELAY; CAPACITY; POWER;
D O I
10.1109/TC.2017.2705678
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Non-Uniform Cache Architecture (NUCA) is a viable solution to mitigate the problem of large on-chip wire delay due to the rapid increase in the cache capacity of chip multiprocessors (CMPs). Through partitioning the last-level cache (LLC) into smaller banks connected by on-chip network, the access latency will exhibit non-uniform distribution. Various works have well explored the NUCA design, including block migration, block replication and block searching. However, all of the previous mechanisms designed for NUCA are thread-oblivious when multi-threaded applications are deployed on CMP systems. Due to the interference on shared resources, threads often demonstrate unbalanced progress wherein the lagging threads with slow progress are more critical to overall performance. In this paper, we propose a novel NUCA design called thread Criticality Assisted Replication and Migration (CARM). CARM exploits the runtime thread criticality information as hints to adjust the block replication and migration in NUCA. Specifically, CARM aims at boosting parallel application execution through prioritizing block replication and migration for critical threads. Full-system experimental results show that CARM reduces the execution time of a set of PARSEC workloads by 13.7 and 6.8 percent on average compared with the tradition D-NUCA and Re-NUCA respectively. Moreover, CARM also consumes much less energy compared with the evaluated schemes.
引用
收藏
页码:1747 / 1762
页数:16
相关论文
共 34 条
  • [1] Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches
    Kadjo, David
    Kim, Hyungjun
    Gratz, Paul
    Hu, Jiang
    Ayoub, Raid
    2013 IEEE 31ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2013, : 93 - 99
  • [2] An Algorithm for Parallel Execution of Loops in Chip Multiprocessor Caches
    Subha, S.
    2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 85 - 89
  • [3] Thread Migration Prediction for Distributed Shared Caches
    Shim, Keun Sup
    Lis, Mieszko
    Khan, Omer
    Devadas, Srinivas
    IEEE COMPUTER ARCHITECTURE LETTERS, 2014, 13 (01) : 53 - 56
  • [4] Managing wire delay in large chip-multiprocessor caches
    Beckmann, BM
    Wood, DA
    MICRO-37 2004: 37TH ANNUAL INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 2004, : 319 - 330
  • [5] A Flexible Chip Multiprocessor Simulator Dedicated for Thread Level Speculation
    Wang, Yaobin
    An, Hong
    Liu, Zhiqin
    Li, Ling
    Huang, Jun
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 2127 - 2132
  • [6] Design and Analysis of Location Caches in a NoC-Based Chip Multiprocessor System
    Ramakrishnan, D.
    Wu, Y. L.
    Jone, W. B.
    JOURNAL OF LOW POWER ELECTRONICS, 2010, 6 (02) : 240 - 262
  • [7] Improving chip multiprocessor reliability through code replication
    Ozturk, Ozcan
    COMPUTERS & ELECTRICAL ENGINEERING, 2010, 36 (03) : 480 - 490
  • [8] A Fair Thread-Aware Memory Scheduling Algorithm for Chip Multiprocessor
    Zhu, Danfeng
    Wang, Rui
    Wang, Hui
    Qian, Depei
    Luan, Zhongzhi
    Chu, Tianshu
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT 1, PROCEEDINGS, 2010, 6081 : 174 - 185
  • [9] An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor
    Oh, Taecheol
    Lee, Hyunjin
    Lee, Kiyeon
    Cho, Sangyeun
    2009 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2009, : 181 - 186
  • [10] Execution Migration in a Heterogeneous-ISA Chip Multiprocessor
    DeVuyst, Matthew
    Venkat, Ashish
    Tullsen, Dean M.
    ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2012, : 261 - 272