Managing DRAM Latency Divergence in Irregular GPGPU Applications

被引:50
作者
Chatterjee, Niladrish [1 ,2 ]
O'Connor, Mike [2 ,4 ]
Loh, Gabriel H. [3 ]
Jayasena, Nuwan [3 ]
Balasubramonian, Rajeev [1 ]
机构
[1] Univ Utah, Salt Lake City, UT 84112 USA
[2] NVIDIA, Santa Clara, CA USA
[3] Adv Micro Devices Inc AMD Res, Sunnyvale, CA USA
[4] Univ Texas Austin, Austin, TX 78712 USA
来源
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS | 2014年
关键词
D O I
10.1109/SC.2014.16
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Memory controllers in modern GPUs aggressively reorder requests for high bandwidth usage, often interleaving requests from different warps. This leads to high variance in the latency of different requests issued by the threads of a warp. Since a warp in a SIMT architecture can proceed only when all of its memory requests are returned by memory, such latency divergence causes significant slowdown when running irregular GPGPU applications. To solve this issue, we propose memory scheduling mechanisms that avoid inter-warp interference in the DRAM system to reduce the average memory stall latency experienced by warps. We further reduce latency divergence through mechanisms that coordinate scheduling decisions across multiple independent memory channels. Finally we show that carefully orchestrating the memory scheduling policy can achieve low average latency for warps, without compromising bandwidth utilization. Our combined scheme yields a 10.1% performance improvement for irregular GPGPU workloads relative to a throughput-optimized GPU memory controller.
引用
收藏
页码:128 / 139
页数:12
相关论文
共 47 条
[1]  
Aamodt Tor M., GPGPU SIM 3 X MANUAL
[2]  
[Anonymous], 2011, GPU COMPUTING GEMS E
[3]  
[Anonymous], P ASPLOS
[4]  
[Anonymous], P IISWC
[5]  
[Anonymous], 2013, INTEL ARCHITECTURE I
[6]  
Ausavarungnirun Rachata, 2012, P ISCA
[7]  
Baghsorkhi S. S., 2012, P PPOPP
[8]  
Bakhoda Ali., 2009, Proceedings of ISPASS
[9]  
Blem E., 2011, P EAMA 4
[10]  
Bojnordi M. N., 2012, P ISCA