Thread Migration Prediction for Distributed Shared Caches

被引:5
|
作者
Shim, Keun Sup [1 ]
Lis, Mieszko [1 ]
Khan, Omer [2 ]
Devadas, Srinivas [1 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Univ Connecticut, Storrs, CT USA
关键词
Parallel Architecture; Distributed Caches; Cache Coherence; Data Locality;
D O I
10.1109/L-CA.2012.30
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Chip-multiprocessors (CMPs) have become the mainstream parallel architecture in recent years; for scalability reasons, designs with high core counts tend towards tiled CMPs with physically distributed shared caches. This naturally leads to a Non-Uniform Cache Access (NUCA) design, where on-chip access latencies depend on the physical distances between requesting cores and home cores where the data is cached. Improving data locality is thus key to performance, and several studies have addressed this problem using data replication and data migration. In this paper, we consider another mechanism, hardware-level thread migration. This approach, we argue, can better exploit shared data locality for NUCA designs by effectively replacing multiple round-trip remote cache accesses with a smaller number of migrations. High migration costs, however, make it crucial to use thread migrations judiciously; we therefore propose a novel, on-line prediction scheme which decides whether to perform a remote access (as in traditional NUCA designs) or to perform a thread migration at the instruction level. For a set of parallel benchmarks, our thread migration predictor improves the performance by 24% on average over the shared-NUCA design that only uses remote accesses.
引用
收藏
页码:53 / 56
页数:4
相关论文
共 50 条
  • [1] Thread-shared software code caches
    Bruening, Derek
    Kiriansky, Vladimir
    Garnett, Timothy
    Banerji, Sanjeev
    CGO 2006: 4TH INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2006, : 28 - 38
  • [2] Thread migration and its applications in distributed shared memory systems
    Computer Science Dep Technion, Haifa, Israel
    J Syst Software, 1 (71-87):
  • [3] Thread migration and its applications in distributed shared memory systems
    Itzkovitz, A
    Schuster, A
    Shalev, L
    JOURNAL OF SYSTEMS AND SOFTWARE, 1998, 42 (01) : 71 - 87
  • [4] Memory latency and thread migration challenges for distributed shared memory systems
    Kavi, KM
    Cohen, WE
    PROCEEDINGS OF THE THIRTY-FIRST HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOL VII: SOFTWARE TECHNOLOGY TRACK, 1998, : 772 - 773
  • [5] Thread Criticality Assisted Replication and Migration for Chip Multiprocessor Caches
    Li, Jianhua
    Li, Minming
    Xue, Chun Jason
    Ouyang, Yiming
    Shen, Fanfan
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (10) : 1747 - 1762
  • [6] Nexus: A New Approach to Replication in Distributed Shared Caches
    Tsai, Po-An
    Beckmann, Nathan
    Sanchez, Daniel
    2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, : 166 - 179
  • [7] A fine-grained thread-aware management policy for shared caches
    Rolan, Dyer
    Andrade, Diego
    Fraguela, Basilio B.
    Doallo, Ramon
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (06): : 1355 - 1374
  • [8] A Hardware Approach to Fairly Balance the Inter-Thread Interference in Shared Caches
    Selfa, Vicent
    Sahuquillo, Julio
    Petit, Salvador
    Gomez, Maria E.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3021 - 3032
  • [9] Thread ID Based Power Reduction Mechanism for Multi-thread Shared Set-associative Caches
    Li, Wenming
    Fan, Lingjun
    Wang, Zihou
    Ye, Xiaochun
    Wang, Da
    Zhang, Hao
    Zhang, Liang
    Fan, Dongrui
    Xie, Xianghui
    2015 SIXTH INTERNATIONAL GREEN COMPUTING CONFERENCE AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2015,
  • [10] Accurate prediction of the behavior of multithreaded applications in shared caches
    Andrade, Diego
    Fraguela, Basilio B.
    Doallo, Ramon
    PARALLEL COMPUTING, 2013, 39 (01) : 36 - 57