UMH: A Hardware-Based Unified Memory Hierarchy for Systems with Multiple Discrete GPUs

被引:12
作者
Ziabari, Amir Kavyan [1 ]
Sun, Yifan [1 ]
Ma, Yenai [2 ]
Schaa, Dana [3 ]
Abellan, Jose L. [4 ]
Ubal, Rafael [1 ]
Kim, John [5 ]
Joshi, Ajay [2 ]
Kaeli, David [1 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, 360 Huntington Ave, Boston, MA 02115 USA
[2] Boston Univ, Dept Elect & Comp Engn, 8 St Marys St, Boston, MA 02215 USA
[3] Adv Micro Devices Inc, 1 AMD Pl, Sunnyvale, CA 94085 USA
[4] Univ Catolica San Antonio Murcia, Dept Comp Sci, Ave Jeronimos 135, Murcia 30107, Spain
[5] Korea Adv Inst Sci & Technol, Dept Comp Sci, 291 Daehak Ro, Daejeon, South Korea
基金
美国国家科学基金会;
关键词
Unified memory architecture; memory hierarchy; graphics processing units; high performance computing; ARCHITECTURE; DESIGN;
D O I
10.1145/2996190
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we describe how to ease memory management between a Central Processing Unit (CPU) and one or multiple discrete Graphic Processing Units (GPUs) by architecting a novel hardware-based Unified Memory Hierarchy (UMH). Adopting UMH, a GPU accesses the CPU memory only if it does not find its required data in the directories associated with its high-bandwidth memory, or the NMOESI coherency protocol limits the access to that data. UsingUMHwith NMOESI improves performance of a CPU-multiGPU system by at least 1.92x in comparison to alternative software-based approaches. It also allows the CPU to access GPUs modified data by at least 13x faster.
引用
收藏
页数:25
相关论文
共 54 条
[31]  
Harris M., 2013, UNIFIED MEMORY CUDA
[32]  
Harrison O, 2007, LECT NOTES COMPUT SC, V4727, P209
[33]  
Jesung Kim, 1995, Proceedings. First IEEE Symposium on High-Performance Computer Architecture, P243, DOI 10.1109/HPCA.1995.386538
[34]   Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache [J].
Jevdjic, Djordje ;
Loh, Gabriel H. ;
Kaynak, Cansu ;
Falsafi, Babak .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :25-37
[35]  
Jevdjic Djordje., 2013, Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA '13, P404
[36]  
Kadiyala M, 1995, INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, P313, DOI 10.1109/ICCD.1995.528827
[37]   Multi-GPU System Design with Memory Networks [J].
Kim, Gwangsun ;
Lee, Minseok ;
Jeong, Jiyun ;
Kim, John .
2014 47TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2014, :484-495
[38]  
Kim G, 2013, INT CONFER PARA, P145, DOI 10.1109/PACT.2013.6618812
[39]  
Kim Y, 2014, INT S HIGH PERF COMP, P546, DOI 10.1109/HPCA.2014.6835963
[40]   ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming [J].
Kim, Youngsok ;
Lee, Jaewon ;
Kim, Donggyu ;
Kim, Jangwoo .
IEEE COMPUTER ARCHITECTURE LETTERS, 2014, 13 (02) :101-104