Sharing-aware Efficient Private Caching in Many-core Server Processors

被引：2

作者：

Shukla, Sudhanshu ^{[1
]}

Chaudhuri, Mainak ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India

来源：

2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD) | 2017年

关键词：

Many-core server processors; Private victim caches; Sharing-aware private caching;

D O I：

10.1109/ICCD.2017.85

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The general-purpose cache-coherent many-core server processors are usually designed with a per-core private cache hierarchy and a large shared multi-banked last-level cache (LLC). The round-trip latency and the volume of traffic through the on-die interconnect between the per-core private cache hierarchy and the shared LLC banks can be significantly large. As a result, optimized private caching is important in such architectures. Traditionally, the private cache hierarchy in these processors treats the private and the shared blocks equally. We, however, observe that elimination of all non-compulsory non-coherence core cache misses to a small subset of shared code and data blocks can save a large fraction of the core requests to the LLC indicating large potential for reducing the interconnect traffic in such architectures. We architect a specialized exclusive per-core private L2 cache which serves as a victim cache for the per-core private L1 cache. The proposed victim cache selectively captures a subset of the L1 cache victims. Our best selective victim caching proposal is driven by an online partitioning of the L1 cache victims based on two distinct features, namely, an estimate of sharing degree and an indirect simple estimate of reuse distance. Our proposal learns the collective reuse probability of the blocks in each partition on-the-fly and decides the victim caching candidates based on these probability estimates. Detailed simulation results on a 128-core system running a selected set of multi-threaded commercial and scientific computing applications show that our best victim cache design proposal at 64 KB capacity, on average, saves 44.1% core cache miss requests sent to the LLC and 10.6% execution cycles compared to a baseline system that has no private L2 cache. In contrast, a traditional 128 KB non-inclusive LRU L2 cache saves 42.2% core cache misses sent to the LLC compared to the same baseline while performing slightly worse than the proposed 64 KB victim cache. In summary, our proposal outperforms the traditional design and enjoys lower interconnect traffic while halving the space investment for the per-core private L2 cache. Further, the savings in core cache misses achieved due to introduction of the proposed victim cache are observed to be only 8% less than an optimal victim cache design at 32 KB and 64 KB capacity points.

引用

页码：485 / 492

页数：8

共 50 条

[11] Workload-Aware Adaptive Power Delivery System Management for Many-Core Processors
Li, Haoran
Xu, Jiang
Wang, Zhe
Maeda, Rafael K., V
Yang, Peng
Tian, Zhongyuan
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (10) : 2076 - 2086
[12] A Load-aware Broadcast Scheme Supporting Rectangular Regions for Many-core Processors
Liu, Xu
Jiang, Jiang
Zhu, Yongxin
Wang, Chang
Han, Xing
PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 842 - 846
[13] Shared resource aware scheduling on power-constrained tiled many-core processors
Jha, Sudhanshu Shekhar
Heirman, Wim
Falcon, Ayose
Tubella, Jordi
Gonzalez, Antonio
Eeckhout, Lieven
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 100 : 30 - 41
[14] Shared Resource Aware Scheduling on Power-Constrained Tiled Many-Core Processors
Shekhar Jha, Sudhanshu
Heirman, Wim
Falcon, Ayose
Tubella, Jordi
Gonzalez, Antonio
Eeckhout, Lieven
PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 365 - 368
[15] Fast Data Delivery for Many-Core Processors
Bakhshalipour, Mohammad
Lotfi-Kamran, Pejman
Mazloumi, Abbas
Samandi, Farid
Naderan-Tahan, Mahmood
Modarressi, Mehdi
Sarbazi-Azad, Hamid
IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (10) : 1416 - 1429
[16] Instruction Fusion for Multiscalar and Many-Core Processors
Lu, Yaojie
Ziavras, Sotirios G.
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (01) : 67 - 78
[17] Instruction Fusion for Multiscalar and Many-Core Processors
Yaojie Lu
Sotirios G. Ziavras
International Journal of Parallel Programming, 2017, 45 : 67 - 78
[18] Emerging Applications for Multi/Many-Core Processors
Lee, Victor W.
Chen, Yen-Kuang
Debuy, Pradeep
2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1524 - 1527
[19] A new power efficient high performance interconnection network for many-core processors
Al Faisal, Faiz
Rahman, M. M. Hafizur
Inoguchi, Yasushi
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 101 : 92 - 102
[20] An efficient numerical solution technique for VLSI interconnect equations on many-core processors
Domnech-Asensi, Gins
Kazmierski, Tom J.
2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,

← 1 2 3 4 5 →