Exploring Memory Access Similarity to Improve Irregular Application Performance for Distributed Hybrid Memory Systems

被引:1
作者
Liu, Wenjie [1 ]
He, Xubin [1 ]
Liu, Qing [2 ]
机构
[1] Temple Univ, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
[2] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07670 USA
基金
美国国家科学基金会;
关键词
Behavioral sciences; Random access memory; Monitoring; Task analysis; Indium tin oxide; Operating systems; Data structures; Cluster; irregular application; memory system; DRAM; hybrid memory system; DRAM;
D O I
10.1109/TPDS.2022.3227544
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
the increasing problem complexity, more irregular applications are deployed on high-performance clusters due to the parallel working paradigm, and yield irregular memory access behaviors across nodes. However, the irregularity of memory access behaviors is not comprehensively studied, which results in low utilization of the integrated hybrid memory system compositing of stacked DRAM and off-chip DRAM. To address this problem, we devise a novel method called Similarity-Managed Hybrid Memory System (SM-HMS) to improve the hybrid memory system performance by leveraging the memory access similarity among nodes in a cluster. Within SM-HMS, two techniques are proposed, Memory Access Similarity Measuring and Similarity-based Memory Access Behavior Sharing. To quantify the memory access similarity, memory access behaviors of each node are vectorized, and the distance between two vectors is used as the memory access similarity. The calculated memory access similarity is used to share memory access behaviors precisely across nodes. With the shared memory access behaviors, SM-HMS divides the stacked DRAM into two sections, the sliding window section and the outlier section. The shared memory access behaviors guide the replacement of the sliding window section while the outlier section is managed in the LRU manner. Our evaluation results with a set of irregular applications on various clusters consisting of up to 256 nodes have shown that SM-HMS outperforms the state-of-the-art approaches, Cameo, Chameleon, and Hyrbid2, on job finish time reduction by up to 58.6%, 56.7%, and 31.3%, with 46.1%, 41.6%, and 19.3% on average, respectively. SM-HMS can also achieve up to 98.6% (91.9% on average) of the ideal hybrid memory system performance.
引用
收藏
页码:797 / 809
页数:13
相关论文
共 38 条
[1]  
Adams M., 2014, Chombo Software Package for AMR Applications - Design Document
[2]   FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction [J].
Asgari, Bahar ;
Hadidi, Ramyad ;
Cao, Jiashen ;
Shim, Da Eun ;
Lim, Sung-Kyu ;
Kim, Hyesoon .
2021 27TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2021), 2021, :908-920
[3]   Bingo Spatial Data Prefetcher [J].
Bakhshalipour, Mohammad ;
Shakerinava, Mehran ;
Lotfi-Kamran, Pejman ;
Sarbazi-Azad, Hamid .
2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, :399-411
[4]  
Chou C, 2014, INT SYMP MICROARCH, P1, DOI 10.1109/MICRO.2014.63
[5]  
Dai Guangli., 2021, ICPP, P9
[6]   Mapreduce: Simplified data processing on large clusters [J].
Dean, Jeffrey ;
Ghemawat, Sanjay .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113
[7]  
Diener Matthias., 2017, EUROMPI, P1
[8]  
El-Ghazawi T., 2002, Supercomputing, ACM/IEEE 2002 Conference, P17
[9]  
Fuglede B, 2004, 2004 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, P31
[10]  
Gabriel E, 2004, LECT NOTES COMPUT SC, V3241, P97