Efficient Characterization of Hidden Processor Memory Hierarchies

被引:1
作者
Cooper, Keith [1 ]
Xu, Xiaoran [1 ]
机构
[1] Rice Univ, Houston, TX 77005 USA
来源
COMPUTATIONAL SCIENCE - ICCS 2018, PT III | 2018年 / 10862卷
关键词
Efficient characterization; Hidden memory hierarchies; Code performance; Portable tool; CACHE;
D O I
10.1007/978-3-319-93713-7_27
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A processor's memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a compiler, a JIT and a runtime system, are used more and more, for example, performing large scale computation in cloud and cluster. Even worse, in such environments, a single computation may use a collection of processors with dissimilar characteristics. Ignorance of the performance-critical parameters of the underlying system makes it difficult to improve performance by optimizing the code or adjusting runtime-system behaviors; it also makes application performance harder to understand. To address this problem, we have developed a suite of portable tools that can efficiently derive many of the parameters of processor memory hierarchies, such as levels, effective capacity and latency of caches and TLBs, in a matter of seconds. The tools use a series of carefully considered experiments to produce and analyze cache response curves automatically. The tools are inexpensive enough to be used in a variety of contexts that may include install time, compile time or runtime adaption, or performance understanding tools.
引用
收藏
页码:335 / 349
页数:15
相关论文
共 49 条
[41]   Me-CLOCK: A Memory-Efficient Framework to Implement Replacement Policies for Large Caches [J].
Chen, Zhiguang ;
Xiao, Nong ;
Lu, Yutong ;
Liu, Fang ;
Ou, Yang .
IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (08) :2665-2671
[42]   High Efficient HW/SW Co-design Scheme for Memory Access of Video Decoder [J].
Wu, Ming ;
Guo, Jun ;
Zhang, Chuang .
2ND INTERNATIONAL SYMPOSIUM ON COMPUTER NETWORK AND MULTIMEDIA TECHNOLOGY (CNMT 2010), VOLS 1 AND 2, 2010, :426-429
[43]   Hybrid Stacked Memory Architecture for Energy Efficient Embedded Chip-Multiprocessors Based on Compiler Directed Approach [J].
Onsori, Salman ;
Asad, Arghavan ;
Ozturk, Ozcan ;
Fathy, Mahmood .
2015 SIXTH INTERNATIONAL GREEN COMPUTING CONFERENCE AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2015,
[44]   TensorCache: Reconstructing Memory Architecture With SRAM-Based In-Cache Computing for Efficient Tensor Computations in GPGPUs [J].
Zhang, Yicong ;
Wang, Mingyu ;
Mai, Yangzhan ;
Yu, Zhiyi .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2023, 31 (12) :2030-2043
[45]   Design and energy-efficient architectures for nonvolatile static random access memory using magnetic tunnel junctions [J].
Kitagata, Daiki ;
Yamamoto, Shuu'ichirou ;
Sugahara, Satoshi .
JAPANESE JOURNAL OF APPLIED PHYSICS, 2019, 58 (59)
[46]   An Energy-efficient Non-volatile In-Memory Accelerator for Sparse-representation based Face Recognition [J].
Wang, Yuhao ;
Huang, Hantao ;
Ni, Leibin ;
Yu, Hao ;
Yan, Mei ;
Weng, Chuliang ;
Yang, Wei ;
Zhao, Junfeng .
2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2015, :932-935
[47]   DaLAMED: A Clock-Frequency and Data-Lifetime-Aware Methodology for Energy-Efficient Memory Design in Edge Devices [J].
Jahannia, Belal ;
Amirany, Abdolah ;
Heidari, Elham ;
Dalir, Hamed .
IEEE ACCESS, 2025, 13 :19898-19908
[48]   Efficient motional-mode characterization for high-fidelity trapped-ion quantum computing [J].
Kang, Mingyu ;
Liang, Qiyao ;
Li, Ming ;
Nam, Yunseong .
QUANTUM SCIENCE AND TECHNOLOGY, 2023, 8 (02)
[49]   Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs through In-Cache Atomic Operations [J].
Zhang, Yicong ;
Wang, Mingyu ;
Wang, Wangguang ;
Mai, Yangzhan ;
Huang, Haiqiu ;
Yu, Zhiyi .
2024 57TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO, 2024, :671-685