Performance Modeling of Matrix Multiplication on 3D Memory Integrated FPGA

被引:2
|
作者
Singapura, Shreyas G. [1 ]
Panangadan, Anand [1 ]
Prasanna, Viktor K. [1 ]
机构
[1] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90089 USA
关键词
D O I
10.1109/IPDPSW.2015.133
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in three dimensional integrated circuits have enabled vertical stacks of memory to be integrated with an FPGA layer. Such architectures enable high bandwidth and low latency access to memory which is beneficial for memory-intensive applications. We build a performance model of a representative 3D Memory Integrated FPGA architecture for matrix multiplication. We derive the peak performance of the algorithm on this model in terms of throughput and energy efficiency. We evaluate the effect of different architecture parameters on performance and identify the critical bottlenecks. The parameters include the configuration of memory layers, vaults, and Through Silicon Vias (TSVs). Our analysis indicates that memory is one of the major consumers of energy on such an architecture. We model memory activation scheduling on vaults for this application and show that it improves energy efficiency by 1.83x while maintaining a throughput of 200 GOPS/s. The 3D Memory Integrated FPGA model achieves a peak performance of 93 GOPS/J for a matrix of size 16Kx16K. We also compare the peak performance of a 2D architecture with that of the 3D architecture and observe a marginal improvement in both throughput and energy efficiency. Our analysis indicates that the bottleneck is the FPGA which dominates the total computation time and energy consumption. In addition to matrix multiplication, which requires O(m(3)) amount of computation work to be done, we also analyzed the class of applications which require O(m(2)) work. In particular, for matrix transposition we found out that the improvement is of the order 3x for energy consumption and 7x in runtime. This indicates that the computation cost of the application must match the memory access time in order to exploit the large bandwidth of 3D memory.
引用
收藏
页码:154 / 162
页数:9
相关论文
共 50 条
  • [1] 3D Rectangulations and Geometric Matrix Multiplication
    Floderus, Peter
    Jansson, Jesper
    Levcopoulos, Christos
    Lingas, Andrzej
    Sledneu, Dzmitry
    ALGORITHMICA, 2018, 80 (01) : 136 - 154
  • [2] 3D Rectangulations and Geometric Matrix Multiplication
    Floderus, Peter
    Jansson, Jesper
    Levcopoulos, Christos
    Lingas, Andrzej
    Sledneu, Dzmitry
    ALGORITHMS AND COMPUTATION, ISAAC 2014, 2014, 8889 : 65 - 78
  • [3] 3D Rectangulations and Geometric Matrix Multiplication
    Peter Floderus
    Jesper Jansson
    Christos Levcopoulos
    Andrzej Lingas
    Dzmitry Sledneu
    Algorithmica, 2018, 80 : 136 - 154
  • [4] On-chip Memory Efficient Data Layout for 2D FFT on 3D Memory Integrated FPGA
    Singapura, Shreyas G.
    Kannan, Rajgopal
    Prasanna, Viktor K.
    2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
  • [5] Optimal Dynamic Data Layouts for 2D FFT on 3D Memory Integrated FPGA
    Chen, Ren
    Singapura, Shreyas G.
    Prasanna, Viktor K.
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2015), 2015, 9251 : 338 - 348
  • [6] Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA
    Ren Chen
    Shreyas G. Singapura
    Viktor K. Prasanna
    The Journal of Supercomputing, 2017, 73 : 652 - 663
  • [7] Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA
    Chen, Ren
    Singapura, Shreyas G.
    Prasanna, Viktor K.
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (02): : 652 - 663
  • [8] 2D matrix multiplication on a 3D systolic array
    Lakhani, S
    Wang, Y
    Milenkovic, A
    Milutinovic, V
    MICROELECTRONICS JOURNAL, 1996, 27 (01) : 11 - 22
  • [9] The Performance and Density Advantages of 3D FPGA
    Nakagawa, Y.
    Osada, K.
    Matsumura, T.
    Koike, H.
    Miyamoto, N.
    Takeda, K.
    PROCESSING MATERIALS OF 3D INTERCONNECTS, DAMASCENE AND ELECTRONICS PACKAGING, 2012, 41 (43): : 125 - 134
  • [10] High Performance Matrix Multiplication based on Xilinx Virtex FPGA
    Arulselvi, S.
    Karthik, B.
    Jasmin, M.
    Balaji, S.
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, : 417 - 431