Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling

被引：7

作者：

Caheny, Paul ^{[1
,3
]}

Casas, Marc ^{[1
,3
]}

Moreto, Miguel ^{[1
,3
]}

Gloaguen, Herve ^{[2
]}

Saintes, Maxime ^{[2
]}

Ayguade, Eduard ^{[1
,3
]}

Labarta, Jesus ^{[1
,3
]}

Valero, Mateo ^{[1
,3
]}

机构：

[1] Barcelona Supercomp Ctr, Barcelona, Spain

[2] Univ Politecn Cataluna, Dept Arquitectura Comp, Barcelona, Spain

[3] Bull Atos Technol, Les Clayes Sous Bois, France

来源：

2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT) | 2016年

基金：

欧盟地平线“2020”;

关键词：

Cache Coherence; NUMA; Task-based programming models; ARCHITECTURE;

D O I：

10.1145/2967938.2967962

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on-and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 1.23x to 2.54x and coherence traffic reductions between 44% and 77% in comparison to NUMA-oblivious scheduling and data allocation. Furthermore, we show that the NUMA-aware techniques we employ at the runtime level are crucial to ensure the added hierarchical layer in the directory coherence protocol does not introduce significant coherence traffic to the system.

引用

页码：275 / 286

页数：12

共 50 条

[21] Zero Directory Eviction Victim: Unbounded Coherence Directory and Core Cache Isolation
Chaudhuri, Mainak
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 277 - 290
[22] Exploring grouped coherence for clustered hierarchical cache
Hu, Sensen
Shi, Feng
Ji, Weixing
Chen, Xu
Talpur, Shahnawaz
JOURNAL OF SUPERCOMPUTING, 2017, 73 (09): : 4137 - 4157
[23] Exploring grouped coherence for clustered hierarchical cache
Sensen Hu
Feng Shi
Weixing Ji
Xu Chen
Shahnawaz Talpur
The Journal of Supercomputing, 2017, 73 : 4137 - 4157
[24] Directory Based Cache Coherence Modeller in Multiprocessors: Medium Insight
Arora, Harsh
Mukherjee, Rijubrata
Bej, Abhijit
Adak, Hillol
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2611 - 2617
[25] WiDir: A Wireless-Enabled Directory Cache Coherence Protocol
Franques, Antonio
Kokolis, Apostolos
Abadal, Sergi
Fernando, Vimuth
Misailovic, Sasa
Torrellas, Josep
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, : 304 - 317
[26] A New Approach to Directory Based Solution for Cache Coherence Problem
Mittal, Shaily
Nitin
2014 3RD INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS (ICECCS 2014), 2014, : 9 - 13
[27] Reducing remote conflict misses: NUMA with remote cache versus COMA
Zhang, Z
Torrellas, J
THIRD INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE - PROCEEDINGS, 1997, : 272 - 281
[28] Locality-Based Cache Management and Warp Scheduling for Reducing Cache Contention in GPU
Fang, Juan
Wei, Zelin
Yang, Huijing
MICROMACHINES, 2021, 12 (10)
[29] MASI: An Eviction Aware Cache Coherence Protocol for CMPs
Dalui, Mamata
Som, Tannishtha
Bansal, Shivani
Pant, Shivam
Sikdar, Biplab K.
2016 SIXTH INTERNATIONAL SYMPOSIUM ON EMBEDDED COMPUTING AND SYSTEM DESIGN (ISED 2016), 2016, : 249 - 253
[30] Thread Progress Aware Coherence Adaption for Hybrid Cache Coherence Protocols
Li, Jianhua
Shi, Liang
Li, Qing'an
Xue, Chun Jason
Xu, Yinlong
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (10) : 2697 - 2707

← 1 2 3 4 5 →