Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling

被引:7
|
作者
Caheny, Paul [1 ,3 ]
Casas, Marc [1 ,3 ]
Moreto, Miguel [1 ,3 ]
Gloaguen, Herve [2 ]
Saintes, Maxime [2 ]
Ayguade, Eduard [1 ,3 ]
Labarta, Jesus [1 ,3 ]
Valero, Mateo [1 ,3 ]
机构
[1] Barcelona Supercomp Ctr, Barcelona, Spain
[2] Univ Politecn Cataluna, Dept Arquitectura Comp, Barcelona, Spain
[3] Bull Atos Technol, Les Clayes Sous Bois, France
基金
欧盟地平线“2020”;
关键词
Cache Coherence; NUMA; Task-based programming models; ARCHITECTURE;
D O I
10.1145/2967938.2967962
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on-and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 1.23x to 2.54x and coherence traffic reductions between 44% and 77% in comparison to NUMA-oblivious scheduling and data allocation. Furthermore, we show that the NUMA-aware techniques we employ at the runtime level are crucial to ensure the added hierarchical layer in the directory coherence protocol does not introduce significant coherence traffic to the system.
引用
收藏
页码:275 / 286
页数:12
相关论文
共 50 条
  • [21] Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation
    Chaudhuri, Mainak
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 277 - 290
  • [22] Exploring grouped coherence for clustered hierarchical cache
    Hu, Sensen
    Shi, Feng
    Ji, Weixing
    Chen, Xu
    Talpur, Shahnawaz
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (09): : 4137 - 4157
  • [23] Exploring grouped coherence for clustered hierarchical cache
    Sensen Hu
    Feng Shi
    Weixing Ji
    Xu Chen
    Shahnawaz Talpur
    The Journal of Supercomputing, 2017, 73 : 4137 - 4157
  • [24] Directory Based Cache Coherence Modeller in Multiprocessors: Medium Insight
    Arora, Harsh
    Mukherjee, Rijubrata
    Bej, Abhijit
    Adak, Hillol
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2611 - 2617
  • [25] WiDir: A Wireless-Enabled Directory Cache Coherence Protocol
    Franques, Antonio
    Kokolis, Apostolos
    Abadal, Sergi
    Fernando, Vimuth
    Misailovic, Sasa
    Torrellas, Josep
    2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 304 - 317
  • [26] A New Approach to Directory Based Solution for Cache Coherence Problem
    Mittal, Shaily
    Nitin
    2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 9 - 13
  • [27] Reducing remote conflict misses: NUMA with remote cache versus COMA
    Zhang, Z
    Torrellas, J
    THIRD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE - PROCEEDINGS, 1997, : 272 - 281
  • [28] Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU
    Fang, Juan
    Wei, Zelin
    Yang, Huijing
    MICROMACHINES, 2021, 12 (10)
  • [29] MASI: An Eviction Aware Cache Coherence Protocol for CMPs
    Dalui, Mamata
    Som, Tannishtha
    Bansal, Shivani
    Pant, Shivam
    Sikdar, Biplab K.
    2016 SIXTH INTERNATIONAL SYMPOSIUM ON EMBEDDED COMPUTING AND SYSTEM DESIGN (ISED 2016), 2016, : 249 - 253
  • [30] Thread Progress Aware Coherence Adaption for Hybrid Cache Coherence Protocols
    Li, Jianhua
    Shi, Liang
    Li, Qing'an
    Xue, Chun Jason
    Xu, Yinlong
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (10) : 2697 - 2707