Image Autoregressive Interpolation Model Using GPU-Parallel Optimization

被引:19
作者
Wu, Jiaji [1 ]
Deng, Long [1 ]
Jeon, Gwanggil [1 ,2 ]
机构
[1] Xidian Univ, Sch Elect Engn, Xian 710071, Shaanxi, Peoples R China
[2] Incheon Natl Univ, Dept Embedded Syst Engn, Incheon 22012, South Korea
基金
中国国家自然科学基金;
关键词
Autoregressive model; CUDA; GPU; image interpolation; parallel optimization; ALGORITHM;
D O I
10.1109/TII.2017.2724205
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the growth in the consumer electronics industry, it is vital to develop an algorithm for ultrahigh definition products that is more effective and has lower time complexity. Image interpolation, which is based on an autoregressive model, has achieved significant improvements compared with the traditional algorithm with respect to image reconstruction, including a better peak signal-to-noise ratio (PSNR) and improved subjective visual quality of the reconstructed image. However, the time-consuming computation involved has become a bottleneck in those autoregressive algorithms. Because of the high time cost, image autoregressive-based interpolation algorithms are rarely used in industry for actual production. In this study, in order to meet the requirements of real-time reconstruction, we use diverse compute unified device architecture (CUDA) optimization strategies to make full use of the graphics processing unit (GPU) (NVIDIA Tesla K80), including a shared memory and register and multi-GPU optimization. To be more suitable for the GPU-parallel optimization, we modify the training window to obtain a more concise matrix operation. Experimental results show that, while maintaining a high PSNR and subjective visual quality and taking into account the I/O transfer time, our algorithm achieves a high speedup of 147.3 times for a Lena image and 174.8 times for a 720p video, compared to the original single-threaded C CPU code with -O2 compiling optimization.
引用
收藏
页码:426 / 436
页数:11
相关论文
共 28 条
[1]  
[Anonymous], 2015, CUDA Programming Guide
[2]  
[Anonymous], 2015, Cuda C best practices guide
[3]   Exploring Fine-Grained Task-based Execution on Multi-GPU Systems [J].
Chen, Long ;
Villa, Oreste ;
Gao, Guang R. .
2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, :386-394
[4]   Sparse Representation Based Image Interpolation With Nonlocal Autoregressive Modeling [J].
Dong, Weisheng ;
Zhang, Lei ;
Lukac, Rastislav ;
Shi, Guangming .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (04) :1382-1394
[5]  
HOU HS, 1978, IEEE T ACOUST SPEECH, V26, P508
[6]   Development of a GPU-based high-performance radiative transfer model for the Infrared Atmospheric Sounding Interferometer (IASI) [J].
Huang, Bormin ;
Mielikainen, Jarno ;
Oh, Hyunjong ;
Huang, Hung-Lung Allen .
JOURNAL OF COMPUTATIONAL PHYSICS, 2011, 230 (06) :2207-2221
[7]   A two-level real-time vision machine combining coarse- and fine-grained parallelism [J].
Jensen, Lars Baunegaard With ;
Kjaer-Nielsen, Anders ;
Pauwels, Karl ;
Jessen, Jeppe Barsoe ;
Van Hulle, Marc ;
Kruger, Norbert .
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2010, 5 (04) :291-304
[8]   GPU-based fast cone beam CT reconstruction from undersampled and noisy projection data via total variation [J].
Jia, Xun ;
Lou, Yifei ;
Li, Ruijiang ;
Song, William Y. ;
Jiang, Steve B. .
MEDICAL PHYSICS, 2010, 37 (04) :1757-1760
[9]   Caffe: Convolutional Architecture for Fast Feature Embedding [J].
Jia, Yangqing ;
Shelhamer, Evan ;
Donahue, Jeff ;
Karayev, Sergey ;
Long, Jonathan ;
Girshick, Ross ;
Guadarrama, Sergio ;
Darrell, Trevor .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :675-678
[10]  
Ketan Tang, 2011, Proceedings of the Sixth International Conference on Image and Graphics (ICIG 2011), P66, DOI 10.1109/ICIG.2011.155