Active Memory Cube: A processing-in-memory architecture for exascale systems

被引：128

作者：

Nair, R. ^{[1
]}

Antao, S. F. ^{[1
]}

Bertolli, C. ^{[1
]}

Bose, P. ^{[1
]}

Brunheroto, J. R. ^{[1
]}

Chen, T. ^{[1
]}

Cher, C. -Y. ^{[1
]}

Costa, C. H. A. ^{[1
]}

Doi, J. ^{[2
]}

Evangelinos, C. ^{[3
]}

Fleischer, B. M. ^{[1
]}

Fox, T. W. ^{[1
]}

Gallo, D. S. ^{[4
]}

Grinberg, L. ^{[5
]}

Gunnels, J. A. ^{[1
]}

Jacob, A. C. ^{[1
]}

Jacob, P. ^{[1
]}

Jacobson, H. M. ^{[1
]}

Karkhanis, T. ^{[1
]}

Kim, C. ^{[1
]}

Moreno, J. H. ^{[1
]}

O'Brien, J. K. ^{[1
]}

Ohmacht, M. ^{[1
]}

Park, Y. ^{[1
]}

Prener, D. A. ^{[1
]}

Rosenburg, B. S. ^{[1
]}

Ryu, K. D. ^{[6
]}

Sallenave, O. ^{[1
]}

Serrano, M. J. ^{[1
]}

Siegl, P. D. M. ^{[7
]}

Sugavanam, K. ^{[1
]}

Sura, Z. ^{[1
]}

机构：

[1] IBM Res Div, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA

[2] IBM Res Div, Tokyo, Japan

[3] IBM Res Div, Cambridge, MA 02142 USA

[4] IBM Res Brazil, BR-04007900 Sao Paulo, Brazil

[5] IBM Res Div, Thomas J Watson Res Ctr, Cambridge, MA 02142 USA

[6] LG Elect, Software Platform Lab, Seoul, South Korea

[7] Tech Univ Carolo Wilhelmina Braunschweig, Chair Chip Design Embedded Comp C3E, D-38106 Braunschweig, Germany

来源：

IBM JOURNAL OF RESEARCH AND DEVELOPMENT | 2015年 / 59卷 / 2-3期

关键词：

Compendex;

D O I：

10.1147/JRD.2015.2409732

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing 1018 floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of computation significantly by performing computation in the memory module, rather than moving data through large memory hierarchies to the processor core. The architecture leverages a commercially demonstrated 3D memory stack called the Hybrid Memory Cube, placing sophisticated computational elements on the logic layer below its stack of dynamic random-access memory (DRAM) dies. The paper also describes an Active Memory Cube tuned to the requirements of a scientific exascale system. The computational elements have a vector architecture and are capable of performing a comprehensive set of floating-point and integer instructions, predicated operations, and gather-scatter accesses across memory in the Cube. The paper outlines the software infrastructure used to develop applications and to evaluate the architecture, and describes results of experiments on application kernels, along with performance and power projections.

引用

页数：14

共 20 条

[1] [Anonymous], 2011, IBM BLUE GEN TEAM BL, V23
[2] NEAR-DATA PROCESSING: INSIGHTS FROM A MICRO-46 WORKSHOP
Balasubramonian, Rajeev
Chang, Jichuan
Manning, Troy
Moreno, Jaime H.
Murphy, Richard
Nair, Ravi
Swanson, Steven
[J]. IEEE MICRO, 2014, 34 (04) : 36 - 42
[3] Bohrer P., 2004, Performance Evaluation Review, V31, P8, DOI 10.1145/1054907.1054910
[4] DESIGN OF ION-IMPLANTED MOSFETS WITH VERY SMALL PHYSICAL DIMENSIONS
DENNARD, RH
GAENSSLEN, FH
YU, HN
RIDEOUT, VL
BASSOUS, E
LEBLANC, AR
[J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1974, SC 9 (05) : 256 - 268
[5] The International Exascale Software Project roadmap
Dongarra, Jack
Beckman, Pete
Moore, Terry
Aerts, Patrick
Aloisio, Giovanni
Andre, Jean-Claude
Barkai, David
Berthou, Jean-Yves
Boku, Taisuke
Braunschweig, Bertrand
Cappello, Franck
Chapman, Barbara
Chi, Xuebin
Choudhary, Alok
Dosanjh, Sudip
Dunning, Thom
Fiore, Sandro
Geist, Al
Gropp, Bill
Harrison, Robert
Hereld, Mark
Heroux, Michael
Hoisie, Adolfy
Hotta, Koh
Jin, Zhong
Ishikawa, Yutaka
Johnson, Fred
Kale, Sanjay
Kenway, Richard
Keyes, David
Kramer, Bill
Labarta, Jesus
Lichnewsky, Alain
Lippert, Thomas
Lucas, Bob
Maccabe, Barney
Matsuoka, Satoshi
Messina, Paul
Michielse, Peter
Mohr, Bernd
Mueller, Matthias S.
Nagel, Wolfgang E.
Nakashima, Hiroshi
Papka, Michael E.
Reed, Dan
Sato, Mitsuhisa
Seidel, Ed
Shalf, John
Skinner, David
Snir, Marc
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (01) : 3 - 60
[6] The LINPACK benchmark: past, present and future
Dongarra, JJ
Luszczek, P
Petitet, A
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2003, 15 (09) : 803 - 820
[7] Exascale computing: What future architectures will mean for the user community
Gara, Alan
Nair, Ravi
[J]. PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 3 - 15
[8] Giampapa Mark., 2010, Proceedings of the 2010 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), P1, DOI DOI 10.1109/SC.2010.22
[9] Jeddeloh J., 2012, 2012 IEEE Symposium on VLSI Technology, P87, DOI 10.1109/VLSIT.2012.6242474
[10] A fast and high quality multilevel scheme for partitioning irregular graphs
Karypis, G
Kumar, V
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) : 359 - 392

← 1 2 →