Active Memory Cube: A processing-in-memory architecture for exascale systems

被引:128
作者
Nair, R. [1 ]
Antao, S. F. [1 ]
Bertolli, C. [1 ]
Bose, P. [1 ]
Brunheroto, J. R. [1 ]
Chen, T. [1 ]
Cher, C. -Y. [1 ]
Costa, C. H. A. [1 ]
Doi, J. [2 ]
Evangelinos, C. [3 ]
Fleischer, B. M. [1 ]
Fox, T. W. [1 ]
Gallo, D. S. [4 ]
Grinberg, L. [5 ]
Gunnels, J. A. [1 ]
Jacob, A. C. [1 ]
Jacob, P. [1 ]
Jacobson, H. M. [1 ]
Karkhanis, T. [1 ]
Kim, C. [1 ]
Moreno, J. H. [1 ]
O'Brien, J. K. [1 ]
Ohmacht, M. [1 ]
Park, Y. [1 ]
Prener, D. A. [1 ]
Rosenburg, B. S. [1 ]
Ryu, K. D. [6 ]
Sallenave, O. [1 ]
Serrano, M. J. [1 ]
Siegl, P. D. M. [7 ]
Sugavanam, K. [1 ]
Sura, Z. [1 ]
机构
[1] IBM Res Div, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] IBM Res Div, Tokyo, Japan
[3] IBM Res Div, Cambridge, MA 02142 USA
[4] IBM Res Brazil, BR-04007900 Sao Paulo, Brazil
[5] IBM Res Div, Thomas J Watson Res Ctr, Cambridge, MA 02142 USA
[6] LG Elect, Software Platform Lab, Seoul, South Korea
[7] Tech Univ Carolo Wilhelmina Braunschweig, Chair Chip Design Embedded Comp C3E, D-38106 Braunschweig, Germany
关键词
Compendex;
D O I
10.1147/JRD.2015.2409732
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing 1018 floating-point operations per second), consuming no more than 20 MW in power, by around the year 2020. This paper outlines a new architecture, the Active Memory Cube, which reduces the energy of computation significantly by performing computation in the memory module, rather than moving data through large memory hierarchies to the processor core. The architecture leverages a commercially demonstrated 3D memory stack called the Hybrid Memory Cube, placing sophisticated computational elements on the logic layer below its stack of dynamic random-access memory (DRAM) dies. The paper also describes an Active Memory Cube tuned to the requirements of a scientific exascale system. The computational elements have a vector architecture and are capable of performing a comprehensive set of floating-point and integer instructions, predicated operations, and gather-scatter accesses across memory in the Cube. The paper outlines the software infrastructure used to develop applications and to evaluate the architecture, and describes results of experiments on application kernels, along with performance and power projections.
引用
收藏
页数:14
相关论文
共 20 条
  • [1] [Anonymous], 2011, IBM BLUE GEN TEAM BL, V23
  • [2] NEAR-DATA PROCESSING: INSIGHTS FROM A MICRO-46 WORKSHOP
    Balasubramonian, Rajeev
    Chang, Jichuan
    Manning, Troy
    Moreno, Jaime H.
    Murphy, Richard
    Nair, Ravi
    Swanson, Steven
    [J]. IEEE MICRO, 2014, 34 (04) : 36 - 42
  • [3] Bohrer P., 2004, Performance Evaluation Review, V31, P8, DOI 10.1145/1054907.1054910
  • [4] DESIGN OF ION-IMPLANTED MOSFETS WITH VERY SMALL PHYSICAL DIMENSIONS
    DENNARD, RH
    GAENSSLEN, FH
    YU, HN
    RIDEOUT, VL
    BASSOUS, E
    LEBLANC, AR
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1974, SC 9 (05) : 256 - 268
  • [5] The International Exascale Software Project roadmap
    Dongarra, Jack
    Beckman, Pete
    Moore, Terry
    Aerts, Patrick
    Aloisio, Giovanni
    Andre, Jean-Claude
    Barkai, David
    Berthou, Jean-Yves
    Boku, Taisuke
    Braunschweig, Bertrand
    Cappello, Franck
    Chapman, Barbara
    Chi, Xuebin
    Choudhary, Alok
    Dosanjh, Sudip
    Dunning, Thom
    Fiore, Sandro
    Geist, Al
    Gropp, Bill
    Harrison, Robert
    Hereld, Mark
    Heroux, Michael
    Hoisie, Adolfy
    Hotta, Koh
    Jin, Zhong
    Ishikawa, Yutaka
    Johnson, Fred
    Kale, Sanjay
    Kenway, Richard
    Keyes, David
    Kramer, Bill
    Labarta, Jesus
    Lichnewsky, Alain
    Lippert, Thomas
    Lucas, Bob
    Maccabe, Barney
    Matsuoka, Satoshi
    Messina, Paul
    Michielse, Peter
    Mohr, Bernd
    Mueller, Matthias S.
    Nagel, Wolfgang E.
    Nakashima, Hiroshi
    Papka, Michael E.
    Reed, Dan
    Sato, Mitsuhisa
    Seidel, Ed
    Shalf, John
    Skinner, David
    Snir, Marc
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2011, 25 (01) : 3 - 60
  • [6] The LINPACK benchmark: past, present and future
    Dongarra, JJ
    Luszczek, P
    Petitet, A
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2003, 15 (09) : 803 - 820
  • [7] Exascale computing: What future architectures will mean for the user community
    Gara, Alan
    Nair, Ravi
    [J]. PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 3 - 15
  • [8] Giampapa Mark., 2010, Proceedings of the 2010 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), P1, DOI DOI 10.1109/SC.2010.22
  • [9] Jeddeloh J., 2012, 2012 IEEE Symposium on VLSI Technology, P87, DOI 10.1109/VLSIT.2012.6242474
  • [10] A fast and high quality multilevel scheme for partitioning irregular graphs
    Karypis, G
    Kumar, V
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1998, 20 (01) : 359 - 392