Thread Batching for High-performance Energy-efficient GPU Memory Design

被引：0

作者：

Li, Bing ^{[1
]}

Mao, Mengjie ^{[2
]}

Liu, Xiaoxiao ^{[3
]}

Liu, Tao ^{[4
]}

Liu, Zihao ^{[4
]}

Wen, Wujie ^{[4
]}

Chen, Yiran ^{[1
]}

Li, Hai ^{[1
]}

机构：

[1] Duke Univ, Dept Elect & Comp Engn, Durham, NC 27708 USA

[2] MathWorks Inc, Natick, MA USA

[3] AMD, Santa Clara, CA USA

[4] Florida Int Univ, Dept Elect & Comp Engn, Miami, FL 33174 USA

来源：

ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS | 2019年 / 15卷 / 04期

基金：

美国国家科学基金会;

关键词：

GPU; memory partitioning; thread batch; warp scheduler; FAIRNESS;

D O I：

10.1145/3330152

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly unproved peak memory bandwidth, memory becomes a bottleneck of GPU's performance and energy efficiency. In this article, we propose an integrated architectural scheme to optimize the memory accesses and therefore boost the performance and energy efficiency of GPU. First, we propose a thread batch enabled memory partitioning (TEMP) to improve GPU memory access parallelism. In particular, TEMP groups multiple thread blocks that share the same set of pages into a thread batch and applies a page coloring mechanism to bound each stream multiprocessor (SM) to the dedicated memory banks. After that, TEMP dispatches the thread batch to an SM to ensure high-parallel memory-access streaming from the different thread blocks. Second, a thread batch-aware scheduling (TBAS) scheme is introduced to improve the GPU memory access locality and to reduce the contention on memory controllers and interconnection networks. Experimental results show that the integration of TEMP and TBAS can achieve up to 10.3% performance improvement and 11.3% DRAM energy reduction across diverse GPU applications. We also evaluate the performance interference of the mixed CPU+GPU workloads when they are run on a heterogeneous system that employs our proposed schemes. Our results show that a simple solution can effectively ensure the efficient execution of both GPU and CPU applications.

引用

页数：21

共 50 条

[1] Design techniques for high-performance, energy-efficient control logic
Ko, U
Hill, A
Balsara, PT
1996 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN - DIGEST OF TECHNICAL PAPERS, 1996, : 97 - 100
[2] Energy-Efficient Design Methodologies: High-Performance VLSI Adders
Zeydel, Bart R.
Baran, Dursun
Oklobdzija, Vojin G.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2010, 45 (06) : 1220 - 1233
[3] High-performance, energy-efficient IGBTs
Snyder, Lucy A.
Electron Prod Garden City NY, 2008, 8
[4] Energy-Efficient and High-Performance Software Architecture for Storage Class Memory
Baek, Seungjae
Choi, Jongmoo
Lee, Donghee
Noh, Sam H.
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 12 (03)
[5] Energy-Efficient and High-Performance Data Converters
Goes, Joao
2024 31ST INTERNATIONAL CONFERENCE ON MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEM, MIXDES 2024, 2024, : 15 - 15
[6] Encodings for high-performance energy-efficient signaling
Bogliolo, A
ISLPED'01: PROCEEDINGS OF THE 2001 INTERNATIONAL SYMPOSIUM ON LOWPOWER ELECTRONICS AND DESIGN, 2001, : 170 - 175
[7] Energy-efficient high-performance storage system
Wang, Jun
2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 2640 - 2644
[8] Constructing a high-performance, energy-efficient cleanroom
Patel, Bill
Greiner, Jerry
Huffman, Tom R.
Microcontamination, 1991, 9 (02): : 29 - 32
[9] Memory Optimization Paradigm for High Performance Energy Efficient GPU
Voora, Prashanth
Anand, Vipin
Patel, Nilaykumar
INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 1, 2018, 83 : 191 - 198
[10] Design of High-performance while Energy-efficient Microprocessor with Novel Asynchronous Techniques
Tang, Xiqin
Shang, Delong
2024 IEEE 35TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP 2024, 2024, : 247 - 248

← 1 2 3 4 5 →