A Customized Processor for Energy Efficient Scientific Computing

被引：5

作者：

Sethia, Ankit ^{[1
]}

Dasika, Ganesh ^{[2
]}

Mudge, Trevor ^{[1
]}

Mahlke, Scott ^{[1
]}

机构：

[1] Univ Michigan, Dept Elect Engn & Comp Sci, Ann Arbor, MI 48109 USA

[2] ARM Inc, Austin, TX 78746 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2012年 / 61卷 / 12期

基金：

美国国家科学基金会;

关键词：

Low-power design; hardware; SIMD processors; processor architectures; parallel processors; Graphics Processing Unit (GPU); throughput computing; scientific computing; ARCHITECTURE; PERFORMANCE; GPU;

D O I：

10.1109/TC.2012.144

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a high-end laptop computer. While these devices have clearly changed the landscape of computing, there are two central problems that arise. First, GPUs are designed and optimized for graphics applications resulting in delivered performance that is far below peak for more general scientific and mathematical applications. Second, GPUs are power hungry devices that often consume 100-300 watts, which restricts the scalability of the solution and requires expensive cooling. To combat these challenges, this paper presents the PEPSC architecture-an architecture customized for the domain of data parallel dense matrix style scientific application where power efficiency is the central focus. PEPSC utilizes a combination of a 2D single-instruction multiple-data (SIMD) datapath, an intelligent dynamic prefetching mechanism, and a configurable SIMD control approach to increase execution efficiency over conventional GPUs. A single PEPSC core has a peak performance of 120 GFLOPs while consuming 2 W of power when executing modern scientific applications, which represents an increase in computation efficiency of more than 10X over existing GPUs.

引用

页码：1711 / 1723

页数：13

共 27 条

[11] Goldstein SC, 1999, CONF PROC INT SYMP C, P28, DOI [10.1109/ISCA.1999.765937, 10.1145/307338.300982]
[12] Toward A Multicore Architecture for Real-time Ray-tracing
Govindaraju, Venkatraman
Djeu, Peter
Sankaralingam, Karthikeyan
Vernon, Mary
Mark, William R.
[J]. 2008 PROCEEDINGS OF THE 41ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-41, 2008, : 176 - +
[13] Huaiyu Zhu, 2010, 24th ACM International Conference on Supercomputing 2010, P169
[14] Krashinsky R, 2004, CONF PROC INT SYMP C, P52
[15] Lee VW, 2010, CONF PROC INT SYMP C, P451, DOI 10.1145/1816038.1816021
[16] Michalakes J., 2008, PROC IEEE INT S PARA, P1
[17] Pratas Frederico, 2009, Proceedings of the 2009 International Conference on Parallel Processing (ICPP 2009), P9, DOI 10.1109/ICPP.2009.30
[18] Rau B R., 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture, P63
[19] Exploring novel parallelization technologies for 3-d Imaging applications
Rivera, Diego
Schaa, Dana
Kaeli, David
Moffie, Micha
[J]. 19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2007, : 26 - 33
[20] Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Sankaralingam, K
Nagarajan, R
Liu, H
Kim, C
Huh, J
Burger, D
Keckler, SW
Moore, CR
[J]. 30TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, PROCEEDINGS, 2003, : 422 - 433

← 1 2 3 →