Power-efficient prefetching on GPGPUs

被引：7

作者：

Falahati, Hajar ^{[1
]}

Hessabi, Shaahin ^{[1
]}

Abdi, Mania ^{[1
]}

Baniasadi, Amirali ^{[2
]}

机构：

[1] Sharif Univ Technol, Tehran, Iran

[2] Univ Victoria, Victoria, BC, Canada

来源：

JOURNAL OF SUPERCOMPUTING | 2015年 / 71卷 / 08期

关键词：

GPGPU; Performance and power optimization; Global memory access; Prefetching; Idle processing element; Utilization;

D O I：

10.1007/s11227-014-1331-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The graphics processing unit (GPU) is the most promising candidate platform for achieving faster improvements in peak processing speed, low latency and high performance. The highly programmable and multithreaded nature of GPUs makes them a remarkable candidate for general purpose computing. However, supporting non-graphics computing on graphics processors requires addressing several architectural challenges. In this paper, we focus on improving performance by better hiding long waiting time for transferring data from the slow global memory. Furthermore, we show that the proposed method can reduce power and energy. Reduction in access time to off-chip data has a noticeable role in reducing waiting time and the percentage of unutilized elements. Also, using processing elements in a suitable manner to prefetch data during stall times bridges the memory gap in an energy-efficient manner, and consequently leads to less power and energy consumption. Simulation results show that we can potentially improve instruction per cycle (IPC) up to 24.76 %. Moreover, results show that power, energy and energy efficiency improve by up to 22.47, 24.72 and 36.01 %, respectively.

引用

页码：2808 / 2829

页数：22

共 36 条

[1] Architecting Graphics Processors for Non-Graphics Compute Acceleration [J].

Aamodt, Tor M. .

2009 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2009, :963-968

[2]

Aamodt TorM., 2012, Gpgpu-sim 3. x manual

[3]

Abdel-Majeed Mohammad, 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Proceedings, P111, DOI 10.1145/2540708.2540719

[4]

Agarwal V, 2000, PROCEEDING OF THE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, P248, DOI [10.1109/ISCA.2000.854395, 10.1145/342001.339691]

[5]

[Anonymous], 2013, ACM SIGARCH Computer Architecture News

[6]

Bakhoda A, 2009, INT SYM PERFORM ANAL, P163, DOI 10.1109/ISPASS.2009.4919648

[7] Taxonomy of Data Prefetching for Multicore Processors [J].

Byna, Surendra ;

Chen, Yong ;

Sun, Xian-He .

JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2009, 24 (03) :405-417

[8]

Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797

[9]

Chu MM, 2010, GPU COMP PRES FUT AT

[10] A Mechanistic Performance Model for Superscalar Out-of-Order Processors [J].

Eyerman, Stijn ;

Eeckhout, Lieven ;

Karkhanis, Tejas ;

Smith, James E. .

ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2009, 27 (02)

← 1 2 3 4 →