Architecture and FPGA Design of Dichotomous Coordinate Descent Algorithms

被引:86
作者
Liu, Jie [1 ]
Zakharov, Yuriy V. [1 ]
Weaver, Ben [1 ]
机构
[1] Univ York, Dept Elect, York YO10 5DD, N Yorkshire, England
关键词
Dichotomous coordinate descent (DCD) algorithm; DCD; field-programmable gate array (FPGA); least squares (LS); multiple-input-multiple-output (MIMO); multiuser detection; normal equations; recursive LS; OPTIMIZATION;
D O I
10.1109/TCSI.2009.2015725
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the areas of signal processing and communications, such as antenna-array beamforming, adaptive filtering, multiuser and multiple-input-multiple-output (MIMO) detection, channel estimation and equalization, echo and interference cancellation, and others, solving linear systems of equations often provides an optimal performance. However, this is also a very complicated operation that designers try to avoid by proposing different suboptimal techniques. The dichotomous coordinate descent (DCD) algorithm allows linear systems of equations to be solved with high computational efficiency. In this paper, we present architectures and field-programmable gate-array (FPGA) designs of two variants of the DCD algorithm, which are known as cyclic and leading DCD algorithms. For each of these techniques, we present serial designs, group-2 and group-4 designs, as well as a design with parallel update of the residual vector for the cyclic DCD algorithm. These designs have different degrees of parallelism, thus enabling a tradeoff between FPGA resources and computation time. The serial designs require the smallest FPGA resources; they are well suited for applications where many parallel solvers are required, e. g., for detection in MIMO-orthogonal-frequency-division-multiplexing communication systems. The parallelism introduced in the proposed group-2 and group-4 designs allows faster convergence to the true solution at the expense of an increase in FPGA resources. The design with parallel update of the residual vector provides the fastest convergence speed; however, if the system size is high, it may result in a significant increase in FPGA resources. The proposed fixed-point designs provide an accuracy performance that is very close to the performance of floating-point counterparts and require significantly lower FPGA resources than techniques based on QR decomposition.
引用
收藏
页码:2425 / 2438
页数:14
相关论文
共 41 条
[1]   The Gauss-Seidel fast affine projection algorithm [J].
Albu, F ;
Kadlec, J ;
Coleman, N ;
Fagan, A .
2002 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS, 2002, :109-114
[2]  
[Anonymous], 2007, J Uncertain Syst
[3]  
[Anonymous], 2004, Digital Speech: Coding for Low Bit Rate Communication Systems
[4]  
[Anonymous], 2007, XCELL J
[5]  
Bateman A., 2002, The DSP Handbook, Algorithms, Applications andDesign Techniques
[6]   FPGA based embedded processing architecture for the QRD-RLS algorithm [J].
Boppana, D ;
Dhanoa, K ;
Kempa, J .
12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, :330-331
[7]   CONJUGATE-GRADIENT TECHNIQUES FOR ADAPTIVE FILTERING [J].
BORAY, GK ;
SRINATH, MD .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-FUNDAMENTAL THEORY AND APPLICATIONS, 1992, 39 (01) :1-10
[8]  
Bose T, 2002, IEICE T FUND ELECTR, VE85A, P532
[9]  
Boyd SP., 2006, Convex Optimization
[10]  
CESEAR T, 2005, P SOFTW DEF RAD TECH