Hardware–software optimizations of reconfigurable multi-core processors for floating-point computations of large sparse matrices

被引:0
作者
Xiaofang Wang
机构
[1] Villanova University,Department of Electrical and Computer Engineering
来源
Journal of Real-Time Image Processing | 2014年 / 9卷
关键词
FPGA; Multi-core processor on a programmable chip; Parallel LU factorization; Hardware customization; Dynamic scheduling;
D O I
暂无
中图分类号
学科分类号
摘要
State-of-the-art field-programmable gate array (FPGA) technologies have provided exciting opportunities to develop more flexible, less expensive, and better performance floating-point computing platforms for embedded systems. To better harness the full power of FPGAs and to bring FPGAs to more system designers, we investigate unique advantages and optimization opportunities in both software and hardware offered by multi-core processors on a programmable chip (MPoPCs). In this paper, we present our hardware customization and software dynamic scheduling solutions for LU factorization of large sparse matrices on in-house developed MPoPCs. Theoretical analysis is provided to guide the design. Implementation results on an Altera Stratix III FPGA for five benchmark matrices of size up to 7,917 × 7,917 are presented. Our hardware customization alone can reduce the execution time by up to 17.22 %. The integrated hardware–software optimization improves the speedup by an average of 60.30 %.
引用
收藏
页码:187 / 204
页数:17
相关论文
共 98 条
[1]  
Ahmadinia A.(2007)Optimal free-space management and routing-conscious dynamic placement for reconfigurable devices IEEE Trans. Comp. 56 673-680
[2]  
Bobda C.(1996)Proportionate progress: a notion of fairness in resource allocation Algorithmica 15 600-625
[3]  
Fekete S.(2004)From application descriptions to hardware in seconds: a logic-based approach to bridging the gap IEEE Trans. VLSI Syst. 12 420-436
[4]  
Teich J.(2010)Compiling for reconfigurable computing: a survey ACM Comput. Surv. 42 1-13
[5]  
van der Veen J.(2003)On combining temporal partitioning and sharing of functional units in compilation for reconfigurable architectures IEEE Trans. Comp. 52 1362-1375
[6]  
Baruah S.K.(2011)A survey on the application of FPGAs for network infrastructure security IEEE Commun. Surv. Tut. 13 541-561
[7]  
Cohen N.K.(2008)DDBDD: Delay-driven BDD synthesis for FPGAs IEEE Trans. Comput. Aid. Des. Integr. Circ. Syst. 27 1203-1213
[8]  
Plaxton C.G.(2011)Real-time scheduling on heterogeneous system-on-chip architectures using an optimized artificial neural network J. Syst. Archit. 57 340-353
[9]  
Varvel D.A.(2011)High-level synthesis for fpgas: from prototyping to deployment IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 30 473-491
[10]  
Benkrid K.(2011)A survey of hard real-time scheduling for multiprocessor systems ACM Comput. Surv. 43 1-35