Hardware-software optimizations of reconfigurable multi-core processors for floating-point computations of large sparse matrices

被引:1
作者
Wang, Xiaofang [1 ]
机构
[1] Villanova Univ, Dept Elect & Comp Engn, Villanova, PA 19085 USA
关键词
FPGA; Multi-core processor on a programmable chip; Parallel LU factorization; Hardware customization; Dynamic scheduling; FPGA; SYSTEMS; OPERATIONS;
D O I
10.1007/s11554-012-0277-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art field-programmable gate array (FPGA) technologies have provided exciting opportunities to develop more flexible, less expensive, and better performance floating-point computing platforms for embedded systems. To better harness the full power of FPGAs and to bring FPGAs to more system designers, we investigate unique advantages and optimization opportunities in both software and hardware offered by multi-core processors on a programmable chip (MPoPCs). In this paper, we present our hardware customization and software dynamic scheduling solutions for LU factorization of large sparse matrices on in-house developed MPoPCs. Theoretical analysis is provided to guide the design. Implementation results on an Altera Stratix III FPGA for five benchmark matrices of size up to 7,917 x 7,917 are presented. Our hardware customization alone can reduce the execution time by up to 17.22 %. The integrated hardware-software optimization improves the speedup by an average of 60.30 %.
引用
收藏
页码:187 / 204
页数:18
相关论文
共 64 条
[1]   Optimal free-space management and routing-conscious dynamic placement for reconfigurable devices [J].
Ahmadinia, Ali ;
Bobda, Christophe ;
Fekete, Sandor P. ;
Teich, Juergen ;
van der Veen, Jan C. .
IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (05) :673-680
[2]  
[Anonymous], 2007, Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation
[3]  
[Anonymous], P INT WORKSH STAT OF
[4]  
[Anonymous], 2010, IND EV HIGH LEV SYNT
[5]  
[Anonymous], P GCC DEV SUMM
[6]  
[Anonymous], 2011, USER CUSTOMIZABLE AR
[7]  
Aoun D., 2008, P INT C REAL TIM NET
[8]  
Baruah SK, 1996, ALGORITHMICA, V15, P600, DOI 10.1007/BF01940883
[9]   From application descriptions to hardware in seconds: A logic-based approach to bridging the gap [J].
Benkrid, K ;
Crookes, D .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2004, 12 (04) :420-436
[10]   On combining temporal partitioning and sharing of functional units in compilation for reconfigurable architectures [J].
Cardoso, JMP .
IEEE TRANSACTIONS ON COMPUTERS, 2003, 52 (10) :1362-1375