A Real-Time Embedded Heterogeneous GPU/FPGA Parallel System for Radar Signal Processing

被引:0
作者
Rupniewski, Marek [1 ]
Mazurek, Gustaw [1 ]
Gambrych, Jacek [1 ]
Nalecz, Marek [1 ]
Karolewski, Rafal [2 ]
机构
[1] Warsaw Univ Technol, Inst Elect Syst, Warsaw, Poland
[2] Cubiware, TiVo Grp, Software Dev Dept, Warsaw, Poland
来源
2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD) | 2016年
关键词
D O I
10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.128
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
During the last decade, computing accelerated with graphics processing units (GPUs) have attracted the attention of signal processing engineers because of the enormous computational power and energy-efficiency of GPUs. A lot of signal processing applications, and in particular those related to modern radars, have benefited from this technology. However, the bottlenecks of GPU-computing - the relatively slow data transfer to the GPU memory and the large size of the data chunks which have to be fed into the GPU in order for it to attain its maximum computational performance, still restrict the potential use of the technology in some areas. In radar signal processing, both these issues have to be addressed, as the needed throughput can be extremely large and the data cannot be processed in arbitrary big chunks because of the relatively small processing latency required. In the paper, a heterogeneous radar processor consisting of FPGA and GPU devices is proposed and its model implementation is described. The presented performance analyses show that the primary design requirements - high data throughput, high overall computational performance and low latency, are met. The first is achieved with the help of a Remote Direct Memory Access (RDMA) mechanism, the second by employing Compute Unified Device Architecture (CUDA) technology, and the last by applying state of the art programming techniques and establishing a latency/performance trade-off satisfying the given design constraints.
引用
收藏
页码:1189 / 1197
页数:9
相关论文
共 35 条
[1]  
[Anonymous], 7 SER INT BLOCK PCIE
[2]  
[Anonymous], XIL KINT 7 FPGA CONN
[3]  
[Anonymous], INT RAD S IND IRSI N
[4]  
[Anonymous], P FPGA 09 MONT CAL U
[5]  
[Anonymous], 2012 IEEE RAD C MAY
[6]  
[Anonymous], AXI DMA BACK END COR
[7]  
[Anonymous], 2011, 2011 IEEE JORD C APP
[8]  
[Anonymous], 2022, NVIDIA GPUDIRECT RDM
[9]  
[Anonymous], UG927 XIL INC
[10]  
[Anonymous], DU06702001 NVIDIA CO