FSCHOL: An OpenCL-based HPC Framework for Accelerating Sparse Cholesky Factorization on FPGAs

被引:2
作者
Tavakoli, Erfan Bank [1 ]
Riera, Michael [1 ]
Quraishi, Masudul Hassan [1 ]
Ren, Fengbo [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
来源
2021 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2021) | 2021年
关键词
Cholesky factorization; sparse matrix decomposition; FPGA; OpenCL; high-performance computing; reconfigurable computing;
D O I
10.1109/SBAC-PAD53543.2021.00032
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The proposed FSCHOL framework consists of an FPGA kernel implementing a throughput-optimized hardware architecture for accelerating the supernodal multifrontal algorithm for sparse Cholesky factorization and a host program implementing a novel scheduling algorithm for finding the optimal execution order of supernodes computations for an elimination tree on the FPGA to eliminate the need for offchip memory access for storing intermediate results. Moreover, the proposed scheduling algorithm minimizes on-chip memory requirements for buffering intermediate results by resolving the dependency of parent nodes in an elimination tree through temporal parallelism. Experiment results for factorizing a set of sparse matrices in various sizes from SuiteSparse Matrix Collection show that the proposed FSCHOL implemented on an Intel Stratix 10 GX FPGA development board achieves on average 5.5x and 9.7x higher performance and 10.3x and 24.7x lower energy consumption than implementations of CHOLMOD on an Intel Xeon E5-2637 CPU and an NVIDIA V100 GPU, respectively.
引用
收藏
页码:209 / 220
页数:12
相关论文
共 28 条
[1]   POLAR: A Pipelined/Overlapped FPGA-Based LSTM Accelerator [J].
Bank-Tavakoli, Erfan ;
Ghasemzadeh, Seyed Abolfazl ;
Kamal, Mehdi ;
Afzali-Kusha, Ali ;
Pedram, Massoud .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (03) :838-842
[2]  
Bouvier C., 2013, FILTERING STEP DISCR
[3]   Algorithm 887: CHOLMOD, Supernodal Sparse Cholesky Factorization and Update/Downdate [J].
Chen, Yanqing ;
Davis, Timothy A. ;
Hager, William W. ;
Rajamanickam, Sivasankaran .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2008, 35 (03)
[4]  
Davis T., 2014, GPU TECHN C
[5]   The University of Florida Sparse Matrix Collection [J].
Davis, Timothy A. ;
Hu, Yifan .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2011, 38 (01)
[6]  
De Matteis T., 2019, INT C HPC NETW STOR, P17
[7]   Building High Performance System for Processing a Daily Large Volume of Chinese Satellites Imagery [J].
Deng, Huawu ;
Huang, Shicun ;
Wang, Qi ;
Pan, Zhiqiang ;
Xin, Yubin .
HIGH-PERFORMANCE COMPUTING IN REMOTE SENSING IV, 2014, 9247
[8]   THE MULTIFRONTAL SOLUTION OF INDEFINITE SPARSE SYMMETRIC LINEAR-EQUATIONS [J].
DUFF, IS ;
REID, JK .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1983, 9 (03) :302-325
[9]   Suitability Analysis of FPGAs for Heterogeneous Platforms in HPC [J].
Escobar, Fernando A. ;
Chang, Xin ;
Valderrama, Carlos .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2016, 27 (02) :600-612
[10]   A Novel Design of Adaptive and Hierarchical Convolutional Neural Networks using Partial Reconfiguration on FPGA [J].
Farhadi, Mohammad ;
Ghasemi, Mehdi ;
Yang, Yezhou .
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,