Sharing-aware Efficient Private Caching in Many-core Server Processors

被引：2

作者：

Shukla, Sudhanshu ^{[1
]}

Chaudhuri, Mainak ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India

来源：

2017 IEEE 35TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD) | 2017年

关键词：

Many-core server processors; Private victim caches; Sharing-aware private caching;

D O I：

10.1109/ICCD.2017.85

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The general-purpose cache-coherent many-core server processors are usually designed with a per-core private cache hierarchy and a large shared multi-banked last-level cache (LLC). The round-trip latency and the volume of traffic through the on-die interconnect between the per-core private cache hierarchy and the shared LLC banks can be significantly large. As a result, optimized private caching is important in such architectures. Traditionally, the private cache hierarchy in these processors treats the private and the shared blocks equally. We, however, observe that elimination of all non-compulsory non-coherence core cache misses to a small subset of shared code and data blocks can save a large fraction of the core requests to the LLC indicating large potential for reducing the interconnect traffic in such architectures. We architect a specialized exclusive per-core private L2 cache which serves as a victim cache for the per-core private L1 cache. The proposed victim cache selectively captures a subset of the L1 cache victims. Our best selective victim caching proposal is driven by an online partitioning of the L1 cache victims based on two distinct features, namely, an estimate of sharing degree and an indirect simple estimate of reuse distance. Our proposal learns the collective reuse probability of the blocks in each partition on-the-fly and decides the victim caching candidates based on these probability estimates. Detailed simulation results on a 128-core system running a selected set of multi-threaded commercial and scientific computing applications show that our best victim cache design proposal at 64 KB capacity, on average, saves 44.1% core cache miss requests sent to the LLC and 10.6% execution cycles compared to a baseline system that has no private L2 cache. In contrast, a traditional 128 KB non-inclusive LRU L2 cache saves 42.2% core cache misses sent to the LLC compared to the same baseline while performing slightly worse than the proposed 64 KB victim cache. In summary, our proposal outperforms the traditional design and enjoys lower interconnect traffic while halving the space investment for the per-core private L2 cache. Further, the savings in core cache misses achieved due to introduction of the proposed victim cache are observed to be only 8% less than an optimal victim cache design at 32 KB and 64 KB capacity points.

引用

页码：485 / 492

页数：8

共 50 条

[1] Efficient Fault Simulation on Many-Core Processors
Kochte, Michael A.
Schaal, Marcel
Wunderlich, Hans-Joachim
Zoellin, Christian G.
PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 380 - 385
[2] THERMAL-AWARE POWER MIGRATION IN MANY-CORE PROCESSORS
Raghu, Avinash
Karajgikar, Saket
Agonafer, Dereje
Sammakia, Bahgat
PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION 2010, VOL 4, 2012, : 397 - 404
[3] Using the Spring Physical Model to Extend a Cooperative Caching Protocol for Many-Core Processors
Dahmani, Safae
Cudennec, Loic
Louise, Stephane
Gogniat, Guy
2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 303 - 310
[4] Economic models for many-core processors
Kumar, Rakesh
DR DOBBS JOURNAL, 2008, 33 (03): : 10 - 10
[5] SAC: Sharing-Aware Caching in Multi-Chip GPUs
Zhang, Shiqing
Naderan-Tahan, Mahmood
Jahre, Magnus
Eeckhout, Lieven
PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 605 - 617
[6] A PGAS Execution Model for Efficient Stencil Computation on Many-Core Processors
Ikei, Mitsuru
Sato, Mitsuhisa
2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 305 - 314
[7] Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors
Yan, Chenggang
Zhang, Yongdong
Xu, Jizheng
Dai, Feng
Zhang, Jun
Dai, Qionghai
Wu, Feng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (12) : 2077 - 2089
[8] Process Variation-Aware Floorplanning for 3D Many-Core Processors
Hong, Hyejeong
Lim, Jaeil
Kang, Sungho
2012 IEEE ELECTRICAL DESIGN OF ADVANCED PACKAGING AND SYSTEMS SYMPOSIUM (EDAPS), 2012, : 193 - 196
[9] Federated Scheduling in Clustered Many-core Processors
Koike, Ryotaro
Azumi, Takuya
PROCEEDINGS OF THE 2021 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2021), 2021,
[10] Location-Aware Cache Management for Many-Core Processors with Deep Cache Hierarchy
Park, Jongsoo
Yoo, Richard M.
Khudia, Daya S.
Hughes, Christopher J.
Kim, Daehyun
2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,

← 1 2 3 4 5 →