Performance Modeling of Matrix Multiplication on 3D Memory Integrated FPGA

被引：2

作者：

Singapura, Shreyas G. ^{[1
]}

Panangadan, Anand ^{[1
]}

Prasanna, Viktor K. ^{[1
]}

机构：

[1] Univ Southern Calif, Ming Hsieh Dept Elect Engn, Los Angeles, CA 90089 USA

来源：

2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS | 2015年

关键词：

D O I：

10.1109/IPDPSW.2015.133

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent advances in three dimensional integrated circuits have enabled vertical stacks of memory to be integrated with an FPGA layer. Such architectures enable high bandwidth and low latency access to memory which is beneficial for memory-intensive applications. We build a performance model of a representative 3D Memory Integrated FPGA architecture for matrix multiplication. We derive the peak performance of the algorithm on this model in terms of throughput and energy efficiency. We evaluate the effect of different architecture parameters on performance and identify the critical bottlenecks. The parameters include the configuration of memory layers, vaults, and Through Silicon Vias (TSVs). Our analysis indicates that memory is one of the major consumers of energy on such an architecture. We model memory activation scheduling on vaults for this application and show that it improves energy efficiency by 1.83x while maintaining a throughput of 200 GOPS/s. The 3D Memory Integrated FPGA model achieves a peak performance of 93 GOPS/J for a matrix of size 16Kx16K. We also compare the peak performance of a 2D architecture with that of the 3D architecture and observe a marginal improvement in both throughput and energy efficiency. Our analysis indicates that the bottleneck is the FPGA which dominates the total computation time and energy consumption. In addition to matrix multiplication, which requires O(m(3)) amount of computation work to be done, we also analyzed the class of applications which require O(m(2)) work. In particular, for matrix transposition we found out that the improvement is of the order 3x for energy consumption and 7x in runtime. This indicates that the computation cost of the application must match the memory access time in order to exploit the large bandwidth of 3D memory.

引用

页码：154 / 162

页数：9

共 50 条

[1] 3D Rectangulations and Geometric Matrix Multiplication
Floderus, Peter
Jansson, Jesper
Levcopoulos, Christos
Lingas, Andrzej
Sledneu, Dzmitry
ALGORITHMICA, 2018, 80 (01) : 136 - 154
[2] 3D Rectangulations and Geometric Matrix Multiplication
Floderus, Peter
Jansson, Jesper
Levcopoulos, Christos
Lingas, Andrzej
Sledneu, Dzmitry
ALGORITHMS AND COMPUTATION, ISAAC 2014, 2014, 8889 : 65 - 78
[3] 3D Rectangulations and Geometric Matrix Multiplication
Peter Floderus
Jesper Jansson
Christos Levcopoulos
Andrzej Lingas
Dzmitry Sledneu
Algorithmica, 2018, 80 : 136 - 154
[4] On-chip Memory Efficient Data Layout for 2D FFT on 3D Memory Integrated FPGA
Singapura, Shreyas G.
Kannan, Rajgopal
Prasanna, Viktor K.
2016 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2016,
[5] Optimal Dynamic Data Layouts for 2D FFT on 3D Memory Integrated FPGA
Chen, Ren
Singapura, Shreyas G.
Prasanna, Viktor K.
PARALLEL COMPUTING TECHNOLOGIES (PACT 2015), 2015, 9251 : 338 - 348
[6] Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA
Ren Chen
Shreyas G. Singapura
Viktor K. Prasanna
The Journal of Supercomputing, 2017, 73 : 652 - 663
[7] Optimal dynamic data layouts for 2D FFT on 3D memory integrated FPGA
Chen, Ren
Singapura, Shreyas G.
Prasanna, Viktor K.
JOURNAL OF SUPERCOMPUTING, 2017, 73 (02): : 652 - 663
[8] 2D matrix multiplication on a 3D systolic array
Lakhani, S
Wang, Y
Milenkovic, A
Milutinovic, V
MICROELECTRONICS JOURNAL, 1996, 27 (01) : 11 - 22
[9] The Performance and Density Advantages of 3D FPGA
Nakagawa, Y.
Osada, K.
Matsumura, T.
Koike, H.
Miyamoto, N.
Takeda, K.
PROCESSING MATERIALS OF 3D INTERCONNECTS, DAMASCENE AND ELECTRONICS PACKAGING, 2012, 41 (43): : 125 - 134
[10] High Performance Matrix Multiplication based on Xilinx Virtex FPGA
Arulselvi, S.
Karthik, B.
Jasmin, M.
Balaji, S.
JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, : 417 - 431

← 1 2 3 4 5 →