FPGA-Based Hardware Matrix Inversion Architecture Using Hybrid Piecewise Polynomial Approximation Systolic Cells

被引：6

作者：

Vazquez-Castillo, Javier ^{[1
]}

Castillo-Atoche, Alejandro ^{[2
]}

Carrasco-Alvarez, Roberto ^{[3
]}

Longoria-Gandara, Omar ^{[4
]}

Ortegon-Aguilar, Jaime ^{[1
]}

机构：

[1] Univ Quintana Roo, Dept Engn, Chetmal 77019, Quintana Roo, Mexico

[2] Autonomous Univ Yucatan, Dept Mech, Merida 97203, Mexico

[3] Univ Guadalajara, Dept Elect, Guadalajara 44430, Jalisco, Mexico

[4] Western Inst Technol & Higher Educ, Dept Elect Syst & IT, Tlaquepaque 45604, Mexico

来源：

ELECTRONICS | 2020年 / 9卷 / 01期

关键词：

field programmable gate arrays; matrix inversion; piecewise polynomial approximation; QR decomposition; systolic arrays; SQUARE-ROOT; IMPLEMENTATION; MULTIPLIER;

D O I：

10.3390/electronics9010182

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The hardware of the matrix inversion architecture using QR decomposition with Givens Rotations (GR) and a back substitution (BS) block is required for many signal processing algorithms. However, the hardware of the GR algorithm requires the implementation of complex operations, such as the reciprocal square root (RSR), which is typically implemented using LookUp Table (LUT) and COordinate Rotation DIgital Computer (CORDICs), among others, conveying to either high-area consumption or low throughput. This paper introduces an Field-Programmable Gate Array (FPGA)-based full matrix inversion architecture using hybrid piecewise polynomial approximation systolic cells. In the design, a hybrid segmentation technique was incorporated for the implementation of piecewise polynomial systolic cells. This hybrid approach is composed by an external and internal segmentation, where the first is nonuniform and the second is uniform, fitting the curve shape of the complex functions achieving a better signal-quantization-to noise-ratio; furthermore, it improves the time performance and area resources. Experimental results reveal a well-balanced improvement in the design achieving high throughput and, hence, less resource utilization in comparison to state-of-the-art FPGA-based architectures. In our study, the proposed design achieves 7.51 Mega-Matrices per second for performing 4 x 4 matrix operations with a latency of 12 clock cycles; meanwhile, the hardware design requires only 1474 slice registers, 1458 LUTs in an FPGA Virtex-5 XC5VLX220T, and 1474 slice registers and 1378 LUTs when a FPGA Virtex-6 XC6VLX240T is used.

引用

页数：14

共 30 条

[1]

Abels M, 2011, CONF REC ASILOMAR C, P904, DOI 10.1109/ACSSC.2011.6190140

[2] On-chip implementation of a low-latency bit-accurate reciprocal square root unit [J].

Aguilera-Galicia, Cuauhtemoc R. ;

Longoria-Gandara, Omar ;

Pizano-Escalante, Luis ;

Vazquez-Castillo, Javier ;

Salim-Maza, Manuel .

INTEGRATION-THE VLSI JOURNAL, 2018, 63 :9-17

[3] A Unified Architecture for the Accurate and High-Throughput Implementation of Six Key Elementary Functions [J].

Alimohammad, Amirhossein ;

Fard, Saeed Fouladi ;

Cockburn, Bruce F. .

IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (04) :449-456

[4] Systolic parallel architecture for brute-force autoregressive signal modeling [J].

Alwan, Nuha A. S. .

COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (04) :1358-1366

[5]

[Anonymous], 2002, Accuracy and stability of numerical algorithms

[6]

[Anonymous], Matrix Computations

[7]

Aslan S, 2012, MIDWEST SYMP CIRCUIT, P470, DOI 10.1109/MWSCAS.2012.6292059

[8] Multiplier-Free Divide, Square Root, and Log Algorithms [J].

Auger, Francois ;

Luo, Zhen ;

Feuvrie, Bruno ;

Li, Feng .

IEEE SIGNAL PROCESSING MAGAZINE, 2011, 28 (04) :122-126

[9] Design of a DPSK Modem Using CORDIC Algorithm and Its FPGA Implementation [J].

Bag, Joyashree ;

Roy, Subhashis ;

Dutta, P. K. ;

Sarkar, Subir Kumar .

IETE JOURNAL OF RESEARCH, 2014, 60 (05) :355-363

[10]

Canche Santos L., 2015, 2015 International Conference on Reconfigurable Computing and FPGAs (ReConFig), P1, DOI 10.1109/ReConFig.2015.7393290

← 1 2 3 →