Thread Criticality Assisted Replication and Migration for Chip Multiprocessor Caches

被引：2

作者：

Li, Jianhua ^{[1
]}

Li, Minming ^{[2
]}

Xue, Chun Jason ^{[2
]}

Ouyang, Yiming ^{[1
]}

Shen, Fanfan ^{[3
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat, Hefei 230001, Anhui, Peoples R China

[2] City Univ Hong Kong, Dept Comp Sci, Kowloon, Hong Kong, Peoples R China

[3] Wuhan Univ, Comp Sch, Wuhan 430072, Hubei, Peoples R China

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2017年 / 66卷 / 10期

关键词：

Chip multiprocessor; non-uniform cache; thread criticality; replication; migration; WIRE DELAY; CAPACITY; POWER;

D O I：

10.1109/TC.2017.2705678

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Non-Uniform Cache Architecture (NUCA) is a viable solution to mitigate the problem of large on-chip wire delay due to the rapid increase in the cache capacity of chip multiprocessors (CMPs). Through partitioning the last-level cache (LLC) into smaller banks connected by on-chip network, the access latency will exhibit non-uniform distribution. Various works have well explored the NUCA design, including block migration, block replication and block searching. However, all of the previous mechanisms designed for NUCA are thread-oblivious when multi-threaded applications are deployed on CMP systems. Due to the interference on shared resources, threads often demonstrate unbalanced progress wherein the lagging threads with slow progress are more critical to overall performance. In this paper, we propose a novel NUCA design called thread Criticality Assisted Replication and Migration (CARM). CARM exploits the runtime thread criticality information as hints to adjust the block replication and migration in NUCA. Specifically, CARM aims at boosting parallel application execution through prioritizing block replication and migration for critical threads. Full-system experimental results show that CARM reduces the execution time of a set of PARSEC workloads by 13.7 and 6.8 percent on average compared with the tradition D-NUCA and Re-NUCA respectively. Moreover, CARM also consumes much less energy compared with the evaluated schemes.

引用

页码：1747 / 1762

页数：16

共 34 条

[1] Power Gating with Block Migration in Chip-Multiprocessor Last-Level Caches
Kadjo, David
Kim, Hyungjun
Gratz, Paul
Hu, Jiang
Ayoub, Raid
2013 IEEE 31ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2013, : 93 - 99
[2] An Algorithm for Parallel Execution of Loops in Chip Multiprocessor Caches
Subha, S.
2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 85 - 89
[3] Thread Migration Prediction for Distributed Shared Caches
Shim, Keun Sup
Lis, Mieszko
Khan, Omer
Devadas, Srinivas
IEEE COMPUTER ARCHITECTURE LETTERS, 2014, 13 (01) : 53 - 56
[4] Managing wire delay in large chip-multiprocessor caches
Beckmann, BM
Wood, DA
MICRO-37 2004: 37TH ANNUAL INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 2004, : 319 - 330
[5] A Flexible Chip Multiprocessor Simulator Dedicated for Thread Level Speculation
Wang, Yaobin
An, Hong
Liu, Zhiqin
Li, Ling
Huang, Jun
2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 2127 - 2132
[6] Design and Analysis of Location Caches in a NoC-Based Chip Multiprocessor System
Ramakrishnan, D.
Wu, Y. L.
Jone, W. B.
JOURNAL OF LOW POWER ELECTRONICS, 2010, 6 (02) : 240 - 262
[7] Improving chip multiprocessor reliability through code replication
Ozturk, Ozcan
COMPUTERS & ELECTRICAL ENGINEERING, 2010, 36 (03) : 480 - 490
[8] A Fair Thread-Aware Memory Scheduling Algorithm for Chip Multiprocessor
Zhu, Danfeng
Wang, Rui
Wang, Hui
Qian, Depei
Luan, Zhongzhi
Chu, Tianshu
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT 1, PROCEEDINGS, 2010, 6081 : 174 - 185
[9] An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor
Oh, Taecheol
Lee, Hyunjin
Lee, Kiyeon
Cho, Sangyeun
2009 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI, 2009, : 181 - 186
[10] Execution Migration in a Heterogeneous-ISA Chip Multiprocessor
DeVuyst, Matthew
Venkat, Ashish
Tullsen, Dean M.
ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2012, : 261 - 272

← 1 2 3 4 →