Efficient Characterization of Hidden Processor Memory Hierarchies

被引:1
作者
Cooper, Keith [1 ]
Xu, Xiaoran [1 ]
机构
[1] Rice Univ, Houston, TX 77005 USA
来源
COMPUTATIONAL SCIENCE - ICCS 2018, PT III | 2018年 / 10862卷
关键词
Efficient characterization; Hidden memory hierarchies; Code performance; Portable tool; CACHE;
D O I
10.1007/978-3-319-93713-7_27
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A processor's memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a compiler, a JIT and a runtime system, are used more and more, for example, performing large scale computation in cloud and cluster. Even worse, in such environments, a single computation may use a collection of processors with dissimilar characteristics. Ignorance of the performance-critical parameters of the underlying system makes it difficult to improve performance by optimizing the code or adjusting runtime-system behaviors; it also makes application performance harder to understand. To address this problem, we have developed a suite of portable tools that can efficiently derive many of the parameters of processor memory hierarchies, such as levels, effective capacity and latency of caches and TLBs, in a matter of seconds. The tools use a series of carefully considered experiments to produce and analyze cache response curves automatically. The tools are inexpensive enough to be used in a variety of contexts that may include install time, compile time or runtime adaption, or performance understanding tools.
引用
收藏
页码:335 / 349
页数:15
相关论文
共 49 条
[21]   A Quantitative Study of the On-Chip Network and Memory Hierarchy Design for Many-Core Processor [J].
Wang, Xu ;
Gan, Ge ;
Manzano, Joseph ;
Fan, Dongrui ;
Guo, Shuxu .
PROCEEDINGS OF THE 2008 14TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2008, :689-+
[22]   Tyche: An Efficient and General Prefetcher for Indirect Memory Accesses [J].
Xue, Feng ;
Han, Chenji ;
Li, Xinyu ;
Wu, Junliang ;
Zhang, Tingting ;
Liu, Tianyi ;
Hao, Yifan ;
Du, Zidong ;
Guo, Qi ;
Zhang, Fuxin .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (02)
[23]   Redundant Memory Array Architecture for Efficient Selective Protection [J].
Zheng, Ruohuang ;
Huang, Michael C. .
44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, :214-227
[24]   An efficient racetrack memory for L2 cache in GPGPUs [J].
Atoofian, Ehsan ;
Saghir, Ahsan .
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2017, 32 (06) :461-471
[25]   A Survey of Memory-Centric Energy Efficient Computer Architecture [J].
Zhang, Changwu ;
Sun, Hao ;
Li, Shuman ;
Wang, Yaohua ;
Chen, Haiyan ;
Liu, Hengzhu .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (10) :2657-2670
[26]   Efficient Dispatcher Mechanism for SIP Cluster Based on Memory Utilization [J].
Al-Allawee, Ali ;
Mihoubi, Miloud ;
Lorenz, Pascal ;
Abakar, Kerima Saleh .
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, :3370-3375
[27]   Efficient Memory Repair Using Cache-Based Redundancy [J].
Axelos, Nicholas ;
Pekmestzi, Kiamal ;
Gizopoulos, Dimitris .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2012, 20 (12) :2278-2288
[28]   IBM z14: Processor Characterization and Power Management for High-Reliability Mainframe Systems [J].
Berry, Christopher ;
Wolpert, David ;
Vezrytzis, Christos ;
Rizzolo, Richard ;
Carey, Sean ;
Maroz, Yaniv ;
Shi, Hunter ;
Chidambarrao, Dureseti ;
Jacobi, Christian ;
Saporito, Anthony ;
Strach, Thomas ;
Buyuktosunoglu, Alper ;
Lobo, Preetham ;
Chuang, Pierce ;
Owczarczyk, Pawel ;
Bertran, Ramon ;
Webel, Tobias ;
Restle, Phillip J. .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2019, 54 (01) :121-132
[29]   An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns [J].
Li, Bingchao ;
Wei, Jizeng ;
Sun, Jizhou ;
Annavaram, Murali ;
Kim, Nam Sung .
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 16 (03) :1-24
[30]   Working set characterization of applications with an efficient LRU algorithm [J].
Bonebakker, Lodewijk ;
Over, Andrew ;
Sharapov, Ilya .
FORMAL METHODS AND STOCHASTIC MODELS FOR PERFORMANCE EVALUATION, 2006, 4054 :78-92