Design and evaluation of multi-GPU enabled Multiple Symbol Detection algorithm
被引:0
|
作者:
Ying Liu
论文数: 0引用数: 0
h-index: 0
机构:University of Chinese Academy of Sciences,School of Computer and Control
Ying Liu
Haixin Zheng
论文数: 0引用数: 0
h-index: 0
机构:University of Chinese Academy of Sciences,School of Computer and Control
Haixin Zheng
Renliang Zhao
论文数: 0引用数: 0
h-index: 0
机构:University of Chinese Academy of Sciences,School of Computer and Control
Renliang Zhao
Liheng Jian
论文数: 0引用数: 0
h-index: 0
机构:University of Chinese Academy of Sciences,School of Computer and Control
Liheng Jian
机构:
[1] University of Chinese Academy of Sciences,School of Computer and Control
[2] Chinese Academy of Sciences,Key Lab of Big Data Mining and Knowledge Management
[3] Academy of Equipment,School of Electronic, Electrical and Communication Engineering
[4] University of Chinese Academy of Sciences,undefined
来源:
The Journal of Supercomputing
|
2016年
/
72卷
关键词:
Parallel computing;
CUDA;
Multiple Symbol Detection;
Multi-GPU;
Demodulation;
Telemetry;
D O I:
暂无
中图分类号:
学科分类号:
摘要:
Multiple Symbol Detection (MSD) is an important technique in digital signal processing. It estimates the sequence of the received signal by maximum-likelihood principle. Due to its high computational complexity, currently, MSD algorithms were implemented in specialized signal processing devices, such as Field Programmable Gate Arrays (FPGAs). As the rapid development of CUDA, GPU has successfully accelerated applications in a variety of domains. In this paper, we explore to utilize CUDA-enabled GPUs to accelerate MSD algorithm. The computation core of MSD, sliding correlation problem, is formulated and an efficient CUDA parallelization scheme is proposed. CUDA-enabled MSD (CU-MSD) algorithm is implemented by adapting CUDA-enabled sliding correlation. To further improve the scalability of CU-MSD, the implementation on multiple GPUs is proposed as well. Various optimization techniques are used to maximize the performance. The performance of CU-MSD is evaluated by an MSD-based demodulation for PCM/FM telemetry system. Four data sets from a real aerospace PCM/FM integrated baseband system were used in our experiments. The experimental results demonstrate up to 133.3×\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\times $$\end{document} speedup using a single GPU and 514.64×\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\times $$\end{document} speedup using 4 GPUs in a single server.