Accelerating 2D orthogonal matching pursuit algorithm on GPU

被引:0
作者
Yuan Dai
Dongjian He
Yong Fang
Long Yang
机构
[1] Northwest A&F University,College of Mechanical and Electronic Engineering
[2] College of Information Engineering,Laboratory Agri
[3] Northwest A&F University, and Bio
来源
The Journal of Supercomputing | 2014年 / 69卷
关键词
2D-OMP; Parallel computing; GPU; CUDA;
D O I
暂无
中图分类号
学科分类号
摘要
Two-dimensional orthogonal matching pursuit (2D-OMP) algorithm is an extension of the one-dimensional OMP (1D-OMP), whose complexity and memory usage are lower than the 1D-OMP when they are applied to 2D sparse signal recovery. However, the major shortcoming of the 2D-OMP still resides in long computing time. To overcome this disadvantage, we develop a novel parallel design strategy of the 2D-OMP algorithm on a graphics processing unit (GPU) in this paper. We first analyze the complexity of the 2D-OMP and point out that the bottlenecks lie in matrix inverse and projection. After adopting the strategy of matrix inverse update whose performance is superior to traditional methods to reduce the complexity of original matrix inverse, projection becomes the most time-consuming module. Hence, a parallel matrix–matrix multiplication leveraging tiling algorithm strategy is launched to accelerate projection computation on GPU. Moreover, a fast matrix–vector multiplication, a parallel reduction algorithm, and some other parallel skills are also exploited to boost the performance of the 2D-OMP further on GPU. In the case of the sensing matrix of size 128 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 256 (176 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 256, resp.) for a 256 ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} 256 scale image, experimental results show that the parallel 2D-OMP achieves 17×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} to 41×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} (24×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document} to 62×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}, resp.) speedup over the original C code compiled with the O2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_2$$\end{document} optimization option. Higher speedup would be further obtained with larger-size image recovery.
引用
收藏
页码:1363 / 1381
页数:18
相关论文
共 33 条
  • [1] Donoho DL(2006)Compressed sensing IEEE Trans Inf Theory 52 1289-1306
  • [2] Tropp J(2007)Signal recovery from random measurements via orthogonal matching pursuit IEEE Trans Inf Theory 53 4655-4666
  • [3] Gilbert A(2011)Orthogonal matching pursuit for sparse signal recovery with noise IEEE Trans Inf Theory 57 4680-4688
  • [4] Cai TT(2012)2D sparse signal recovery via 2D orthogonal matching pursuit Sci China Inf Sci 55 889-897
  • [5] Wang L(2008)Parallelization of cellular neural networks on GPU Pattern Recognit 41 2684-2692
  • [6] Yong FANG(2013)Graphics processing unit (GPU) programming strategies and trends in GPU computing J Parallel Distrib Comput 73 4-13
  • [7] JiaJi WU(2013)Efficient local search on the GPU—investigations on the vehicle routing problem J Parallel Distrib Comput 73 14-31
  • [8] Huang BM(2011)Computing prestack Kirchhoff time migration on general purpose GPU Comput Geosci 37 1702-1710
  • [9] Ho T-Y(2008)Performance evaluation of image processing algorithms on the GPU J Struct Biol 164 153-160
  • [10] Lam P-M(2011)Accurate GPU-accelerated surface integrals for moment computation Comput Aided Design 43 1284-1295