Towards a Universal FPGA Matrix-Vector Multiplication Architecture

被引:44
作者
Kestur, Srinidhi [1 ]
Davis, John D. [2 ]
Chung, Eric S. [2 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] Microsoft Res Silicon Valley, Mountain View, CA 94043 USA
来源
2012 IEEE 20TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM) | 2012年
关键词
FPGA; dense matrix; sparse matrix; spMV; reconfigurable computing;
D O I
10.1109/FCCM.2012.12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present the design and implementation of a universal, single-bitstream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-optimized sparse matrix representation called Compressed Variable-Length Bit Vector (CVBV), which reduces the storage and bandwidth requirements up to 43% (on average 25%) compared to compressed sparse row (CSR) across all the matrices from the University of Florida Sparse Matrix Collection. Our hardware incorporates a runtime-programmable decoder that performs on-the-fly decoding of various formats such as Dense, COO, CSR, DIA, and ELL. The flexibility and scalability of our design is demonstrated across two FPGA platforms: (1) the BEE3 (Virtex-5 LX155T with 16GB of DRAM) and (2) ML605 (Virtex-6 LX240T with 2GB of DRAM). For dense matrices, our approach scales to large data sets with over 1 billion elements, and achieves robust performance independent of the matrix aspect ratio. For sparse matrices, our approach using a compressed representation reduces the overall bandwidth while also achieving comparable efficiency relative to state-of-the-art approaches.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 27 条
  • [1] [Anonymous], 1994, SPARSKIT BASIC TOOL
  • [2] Exploiting Matrix Symmetry to Improve FPGA-Accelerated Conjugate Gradient
    Bakos, Jason D.
    Nagar, Krishna K.
    [J]. PROCEEDINGS OF THE 2009 17TH IEEE SYMPOSIUM ON FIELD PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2009, : 223 - 226
  • [3] Bell N, 2009, STUDENTS GUIDE TO THE MA TESOL, P1
  • [4] Buluc A., 2011, Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2011), P721, DOI 10.1109/IPDPS.2011.73
  • [5] Buluç A, 2009, SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P233
  • [6] Chi-Chia Sun, 2010, Proceedings of the 2010 IEEE 10th International Conference on Computer and Information Technology (CIT 2010), P2306, DOI 10.1109/CIT.2010.397
  • [7] Davis J. D., 2009, MSRTR200945
  • [8] Davis T.A., 1994, NA DIGEST, V92
  • [9] Demmel J. W., 2002, Performance optimizations and bounds for sparse matrix-vector multiply, P26
  • [10] Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
    Dubois, David
    Dubois, Andrew
    Boorman, Thomas
    Connor, Carolyn
    Poole, Steve
    [J]. ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2010, 3 (01)