Sharing-aware Efficient Private Caching in Many-core Server Processors

被引:2
|
作者
Shukla, Sudhanshu [1 ]
Chaudhuri, Mainak [1 ]
机构
[1] Indian Inst Technol, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India
关键词
Many-core server processors; Private victim caches; Sharing-aware private caching;
D O I
10.1109/ICCD.2017.85
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The general-purpose cache-coherent many-core server processors are usually designed with a per-core private cache hierarchy and a large shared multi-banked last-level cache (LLC). The round-trip latency and the volume of traffic through the on-die interconnect between the per-core private cache hierarchy and the shared LLC banks can be significantly large. As a result, optimized private caching is important in such architectures. Traditionally, the private cache hierarchy in these processors treats the private and the shared blocks equally. We, however, observe that elimination of all non-compulsory non-coherence core cache misses to a small subset of shared code and data blocks can save a large fraction of the core requests to the LLC indicating large potential for reducing the interconnect traffic in such architectures. We architect a specialized exclusive per-core private L2 cache which serves as a victim cache for the per-core private L1 cache. The proposed victim cache selectively captures a subset of the L1 cache victims. Our best selective victim caching proposal is driven by an online partitioning of the L1 cache victims based on two distinct features, namely, an estimate of sharing degree and an indirect simple estimate of reuse distance. Our proposal learns the collective reuse probability of the blocks in each partition on-the-fly and decides the victim caching candidates based on these probability estimates. Detailed simulation results on a 128-core system running a selected set of multi-threaded commercial and scientific computing applications show that our best victim cache design proposal at 64 KB capacity, on average, saves 44.1% core cache miss requests sent to the LLC and 10.6% execution cycles compared to a baseline system that has no private L2 cache. In contrast, a traditional 128 KB non-inclusive LRU L2 cache saves 42.2% core cache misses sent to the LLC compared to the same baseline while performing slightly worse than the proposed 64 KB victim cache. In summary, our proposal outperforms the traditional design and enjoys lower interconnect traffic while halving the space investment for the per-core private L2 cache. Further, the savings in core cache misses achieved due to introduction of the proposed victim cache are observed to be only 8% less than an optimal victim cache design at 32 KB and 64 KB capacity points.
引用
收藏
页码:485 / 492
页数:8
相关论文
共 50 条
  • [1] Efficient Fault Simulation on Many-Core Processors
    Kochte, Michael A.
    Schaal, Marcel
    Wunderlich, Hans-Joachim
    Zoellin, Christian G.
    PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 380 - 385
  • [2] THERMAL-AWARE POWER MIGRATION IN MANY-CORE PROCESSORS
    Raghu, Avinash
    Karajgikar, Saket
    Agonafer, Dereje
    Sammakia, Bahgat
    PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION 2010, VOL 4, 2012, : 397 - 404
  • [3] Using the Spring Physical Model to Extend a Cooperative Caching Protocol for Many-Core Processors
    Dahmani, Safae
    Cudennec, Loic
    Louise, Stephane
    Gogniat, Guy
    2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SOCS (MCSOC), 2014, : 303 - 310
  • [4] Economic models for many-core processors
    Kumar, Rakesh
    DR DOBBS JOURNAL, 2008, 33 (03): : 10 - 10
  • [5] SAC: Sharing-Aware Caching in Multi-Chip GPUs
    Zhang, Shiqing
    Naderan-Tahan, Mahmood
    Jahre, Magnus
    Eeckhout, Lieven
    PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 605 - 617
  • [6] A PGAS Execution Model for Efficient Stencil Computation on Many-Core Processors
    Ikei, Mitsuru
    Sato, Mitsuhisa
    2014 14TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2014, : 305 - 314
  • [7] Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors
    Yan, Chenggang
    Zhang, Yongdong
    Xu, Jizheng
    Dai, Feng
    Zhang, Jun
    Dai, Qionghai
    Wu, Feng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (12) : 2077 - 2089
  • [8] Process Variation-Aware Floorplanning for 3D Many-Core Processors
    Hong, Hyejeong
    Lim, Jaeil
    Kang, Sungho
    2012 IEEE ELECTRICAL DESIGN OF ADVANCED PACKAGING AND SYSTEMS SYMPOSIUM (EDAPS), 2012, : 193 - 196
  • [9] Federated Scheduling in Clustered Many-core Processors
    Koike, Ryotaro
    Azumi, Takuya
    PROCEEDINGS OF THE 2021 IEEE/ACM 25TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT 2021), 2021,
  • [10] Location-Aware Cache Management for Many-Core Processors with Deep Cache Hierarchy
    Park, Jongsoo
    Yoo, Richard M.
    Khudia, Daya S.
    Hughes, Christopher J.
    Kim, Daehyun
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,