A Customized Processor for Energy Efficient Scientific Computing

被引：5

作者：

Sethia, Ankit ^{[1
]}

Dasika, Ganesh ^{[2
]}

Mudge, Trevor ^{[1
]}

Mahlke, Scott ^{[1
]}

机构：

[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

[2] ARM Inc, Austin, TX 78746 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2012年 / 61卷 / 12期

基金：

美国国家科学基金会;

关键词：

Low-power design; hardware; SIMD processors; processor architectures; parallel processors; Graphics Processing Unit (GPU); throughput computing; scientific computing; ARCHITECTURE; PERFORMANCE; GPU;

D O I：

10.1109/TC.2012.144

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. While these devices have clearly changed the landscape of computing, there are two central problems that arise. First, GPUs are designed and optimized for graphics applications resulting in delivered performance that is far below peak for more general scientific and mathematical applications. Second, GPUs are power hungry devices that often consume 100-300 watts, which restricts the scalability of the solution and requires expensive cooling. To combat these challenges, this paper presents the PEPSC architecture-an architecture customized for the domain of data parallel dense matrix style scientific application where power efficiency is the central focus. PEPSC utilizes a combination of a 2D single-instruction multiple-data (SIMD) datapath, an intelligent dynamic prefetching mechanism, and a configurable SIMD control approach to increase execution efficiency over conventional GPUs. A single PEPSC core has a peak performance of 120 GFLOPs while consuming 2 W of power when executing modern scientific applications, which represents an increase in computation efficiency of more than 10X over existing GPUs.

引用

页码：1711 / 1723

页数：13

共 27 条

[1] [Anonymous], INFR RES ILP
[2] [Anonymous], 2006, Tech. rep.
[3] [Anonymous], 2009, PARALLEL DISTRIBUTED
[4] Bakhoda A, 2009, INT SYM PERFORM ANAL, P163, DOI 10.1109/ISPASS.2009.4919648
[5] Che SA, 2009, I S WORKL CHAR PROC, P44, DOI 10.1109/IISWC.2009.5306797
[6] EFFECTIVE HARDWARE-BASED DATA PREFETCHING FOR HIGH-PERFORMANCE PROCESSORS
CHEN, TF
BAER, JL
[J]. IEEE TRANSACTIONS ON COMPUTERS, 1995, 44 (05) : 609 - 623
[7] Dally WilliamJ., 2003, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, P35, DOI 10.1145/1048935.1050187
[8] Díaz P, 2009, CONF PROC INT SYMP C, P81, DOI 10.1145/1555815.1555767
[9] Fu J. W. C., 1992, SIGMICRO Newsletter, V23, P102
[10] Dynamic warp formation and scheduling for efficient GPU control flow
Fung, Wilson W. L.
Sham, Ivan
Yuan, George
Aamodt, Tor M.
[J]. MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, : 407 - +

← 1 2 3 →