Accurately modeling the on-chip and off-chip GPU memory subsystem

被引:12
作者
Candel, Francisco [1 ]
Petit, Salvador [1 ]
Sahuquillo, Julio [1 ]
Duato, Jose [1 ]
机构
[1] Univ Politecn Valencia, Dept Comp Engn, Valencia 46012, Spain
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2018年 / 82卷
关键词
Applied modeling and simulation; On-chip memory subsystem; Main memory controller; GDDR; Cache coherence protocol;
D O I
10.1016/j.future.2017.02.012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Research on GPU architecture is becoming pervasive in both the academia and the industry because these architectures offer much more performance per watt than typical CPU architectures. This is the main reason why massive deployment of GPU multiprocessors is considered one of the most feasible solutions to attain exascale computing capabilities. The memory hierarchy of the GPU is a critical research topic, since its design goals widely differ from those of conventional CPU memory hierarchies. Researchers typically use detailed microarchitectural simulators to explore novel designs to better support GPGPU computing as well as to improve the performance of GPU and CPU-GPU systems. In this context, the memory hierarchy is a critical and continuously evolving subsystem. Unfortunately, the fast evolution of current memory subsystems deteriorates the accuracy of existing state-of-the-art simulators. This paper focuses on accurately modeling the entire (both on-chip and off-chip) GPU memory subsystem. For this purpose, we identify four main memory related components that impact on the overall performance accuracy. Three of them belong to the on-chip memory hierarchy: (i) memory request coalescing mechanisms, (ii) miss status holding registers, and (iii) cache coherence protocol; while the fourth component refers to the memory controller and GDDR memory working activity. To evaluate and quantify our claims, we accurately modeled the aforementioned memory components in an extended version of the state-of-the-art Multi2Sim heterogeneous CPUGPU processor simulator. Experimental results show important deviations, which can vary the final system performance provided by the simulation framework up to a factor of three. The proposed GPU model has been compared and validated against the original framework and the results from a real AMD Southern-Islands 7870HD GPU. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:510 / 519
页数:10
相关论文
共 28 条
[1]  
A.R.G. Technology, 2012, AMD GRAPH COR NEXT G
[2]  
[Anonymous], 2013, SIGARCH Comput. Archit. News, DOI [DOI 10.1145/2508148.2485964, 10.1145/2508148.2485964, DOI 10.1145/2485922]
[3]  
[Anonymous], 2007, Memory Systems: Cache, DRAM, Disk
[4]  
[Anonymous], 2013, AMD ACC PAR PROC OPE
[5]  
[Anonymous], 2009, Computer system, V26, P63
[6]  
[Anonymous], 2015, OPENCL SPECIFICATION
[7]  
[Anonymous], OpenCL - the open standard for parallel programming of heterogeneous systems
[8]   UNISIM: An open simulation environment and library for complex architecture design and collaborative development [J].
INRIA, Orsay, France ;
不详 ;
不详 ;
不详 .
IEEE Comput. Archit. Lett., 2007, 2 (45-48) :45-48
[9]  
Bakhoda A, 2009, INT SYM PERFORM ANAL, P163, DOI 10.1109/ISPASS.2009.4919648
[10]  
Binkert Nathan, 2011, Computer Architecture News, V39, P1, DOI 10.1145/2024716.2024718