Hierarchical Cache Directory for CMP

被引:21
|
作者
Guo, Song-Liu [1 ]
Wang, Hai-Xia [2 ]
Xue, Yi-Bo [2 ]
Li, Chong-Min [1 ]
Wang, Dong-Sheng [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Tsinghua Natl Lab Informat Sci & Technol, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
cache coherence protocol; hierarchical directory; chip multiprocessor; ARCHITECTURE;
D O I
10.1007/s11390-010-9321-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As more processing cores are integrated into one chip and feature size continues to shrink, the average access latency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as, data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP (chip multiprocessor), which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a, region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2(n) x 2(n) tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable.
引用
收藏
页码:246 / 256
页数:11
相关论文
共 50 条
  • [1] Hierarchical Cache Directory for CMP
    Song-Liu Guo
    Hai-Xia Wang
    Yi-Bo Xue
    Chong-Min Li
    Dong-Sheng Wang
    Journal of Computer Science and Technology, 2010, 25 : 246 - 256
  • [2] Hierarchical Cache Directory for CMP
    郭松柳
    王海霞
    薛一波
    李崇民
    汪东升
    Journal of Computer Science & Technology, 2010, 25 (02) : 246 - 256
  • [3] A fault-tolerant directory-based cache coherence protocol for CMP architectures
    Fernandez-Pascual, Ricardo
    Garcia, Jose M.
    Acacio, Manuel E.
    Duato, Jose
    2008 IEEE INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS & NETWORKS WITH FTCS & DCC, 2008, : 267 - +
  • [4] Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling
    Caheny, Paul
    Casas, Marc
    Moreto, Miguel
    Gloaguen, Herve
    Saintes, Maxime
    Ayguade, Eduard
    Labarta, Jesus
    Valero, Mateo
    2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 275 - 286
  • [5] Pthreads Performance Characteristics on Shared Cache CMP, Private Cache CMP and SMP
    Tan, Ian K. T.
    Chai, Ian
    Hoong, Poo Kuan
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 1, 2010, : 186 - 191
  • [6] PS directory: a scalable multilevel directory cache for CMPs
    Joan J. Valls
    Alberto Ros
    Julio Sahuquillo
    María E. Gómez
    The Journal of Supercomputing, 2015, 71 : 2847 - 2876
  • [7] PS directory: a scalable multilevel directory cache for CMPs
    Valls, Joan J.
    Ros, Alberto
    Sahuquillo, Julio
    Gomez, Maria E.
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (08): : 2847 - 2876
  • [8] A new cache directory scheme
    Wu, YG
    Muntz, RR
    SECOND INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS (I-SPAN '96), PROCEEDINGS, 1996, : 466 - 472
  • [9] Segment directory: An improvement to the pointer in directory cache coherence schemes
    Department of Electrical Engineering, Korea Adv. Inst. Sci. and Technol., 373-1 Kusong-Dong Yusong-Gu, Taejon, 305-701, Korea, Republic of
    Parallel Processing Letters, 1998, 8 (04): : 577 - 588
  • [10] Segment directory enhancing the limited directory cache coherence schemes
    Choi, Jong Hyuk
    Park, Kyu Ho
    Proceedings of the International Parallel Processing Symposium, IPPS, 1999, : 258 - 267