FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture

被引:42
|
作者
Jaiswal, Manish Kumar [1 ]
Chandrachoodan, Nitin [2 ]
机构
[1] ICFAI Univ, Dehra Dun, India
[2] Indian Inst Technol, Dept Elect Engn, Madras 600036, Tamil Nadu, India
关键词
LU decomposition; block LU; FPGA; hardware acceleration; floating point arithmetics; single/double precision; scaling; ATLAS; Intel-MKL; GPU; LINEAR ALGEBRA; STABILITY;
D O I
10.1109/TC.2011.24
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Decomposition of a matrix into lower and upper triangular matrices (LU decomposition) is a vital part of many scientific and engineering applications, and the block LU decomposition algorithm is an approach well suited to parallel hardware implementation. This paper presents an approach to speed up implementation of the block LU decomposition algorithm using FPGA hardware. Unlike most previous approaches reported in the literature, the approach does not assume the matrix can be stored entirely on chip. The memory accesses are studied for various FPGA configurations, and a schedule of operations for scaling well is shown. The design has been synthesized for FPGA targets and can be easily retargeted. The design outperforms previous hardware implementations, as well as tuned software implementations including the ATLAS and MKL libraries on workstations.
引用
收藏
页码:60 / 72
页数:13
相关论文
共 50 条
  • [31] Development of an FPGA-based high-performance servo drive system for PMSM
    Cui, Naizheng
    Yang, Guijie
    Liu, Yajing
    Zhao, Pinzhi
    ISSCAA 2006: 1ST INTERNATIONAL SYMPOSIUM ON SYSTEMS AND CONTROL IN AEROSPACE AND ASTRONAUTICS, VOLS 1AND 2, 2006, : 881 - +
  • [32] A high-performance FPGA-based multicrossbar prioritized network-on-chip
    Alaei, Mohammad
    Yazdanpanah, Fahimeh
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (06):
  • [33] High-Performance Accurate and Approximate Multipliers for FPGA-Based Hardware Accelerators
    Ullah, Salim
    Rehman, Semeen
    Shafique, Muhammad
    Kumar, Akash
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (02) : 211 - 224
  • [34] FPGA-Based Design Of a High-Performance and Modular Video Processing Platform
    Desmouliers, Christophe
    Oruklu, Erdal
    Saniie, Jafar
    2009 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY, 2009, : 391 - 396
  • [35] Decomposition strategies and their performance in FPGA-based technology mapping
    Selvaraj, H
    Nowicka, M
    Luba, T
    ELEVENTH INTERNATIONAL CONFERENCE ON VLSI DESIGN, PROCEEDINGS, 1997, : 388 - 393
  • [36] High-performance FPGA based camera architecture for range imaging
    Lepisto, Niklas
    Thornberg, Benny
    O'Nils, Mattias
    NORCHIP 2005, PROCEEDINGS, 2005, : 165 - 168
  • [37] Scalable FPGA-based Architecture for DCT Computation Using Dynamic Partial Reconfiguration
    Huang, Jian
    Parris, Matthew
    Lee, Jooheung
    Demara, Ronald F.
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2009, 9 (01) : 9
  • [38] FPGA-based architecture for block-matching motion estimation algorithm
    Reddy, V. S. K.
    Sengupta, Somnath
    WMSCI 2007 : 11TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, POST CONFERENCE ISSUE, PROCEEDINGS, 2007, : 205 - 208
  • [39] FPGA-based hardware accelerator for high-performance data-stream processing
    Lysakov K.F.
    Shadrin M.Y.
    Pattern Recognition and Image Analysis, 2013, 23 (1) : 26 - 34
  • [40] FPGA-Based High-Performance Data Compression Deep Neural Network Accelerator
    Wang, Hanze
    Fu, Yingxun
    Ma, Li
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 563 - 569