Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

被引:14
作者
Su, Huayou [1 ]
Wen, Mei [1 ]
Wu, Nan [1 ]
Ren, Ju [1 ]
Zhang, Chunyuan [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci & Sci & Technol, Parallel & Distributed Proc Lab, Changsha 410073, Hunan, Peoples R China
基金
国家高技术研究发展计划(863计划);
关键词
ALGORITHM; DESIGN;
D O I
10.1155/2014/716020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.
引用
收藏
页数:19
相关论文
共 39 条
[1]   Video compression with parallel processing [J].
Ahmad, I ;
He, Y ;
Liou, ML .
PARALLEL COMPUTING, 2002, 28 (7-8) :1039-1078
[2]  
Baker M.A., 2009, P 7 IEEEACM INT C HA, P353, DOI DOI 10.1145/1629435.1629484
[3]  
Bross B., 2012, High Efficiency Video Coding (HEVC) Text Specification Draft 9
[4]   Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder [J].
Chen, Tung-Chien ;
Chien, Shao-Yi ;
Huang, Yu-Wen ;
Tsai, Chen-Han ;
Chen, Ching-Yeh ;
Chen, To-Wei ;
Chen, Liang-Gee .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2006, 16 (06) :673-688
[5]  
Chen WN, 2008, 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, P697
[6]  
Chen Z, 2002, JVTF017, P5
[7]   A novel cross-diamond search algorithm for fast block motion estimation [J].
Cheung, CH ;
Po, LM .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2002, 12 (12) :1168-1177
[8]   PARALLEL RATE-DISTORTION OPTIMIZED INTRA MODE DECISION ON MULTI-CORE GRAPHICS PROCESSORS USING GREEDY-BASED ENCODING ORDERS [J].
Cheung, Ngai-Man ;
Au, Oscar C. ;
Kung, Man-Cheung ;
Fan, Xiaopeng .
2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, :2309-2312
[9]   Video Coding on Multicore Graphics Processors [J].
Cheung, Ngai-Man ;
Fan, Xiaopeng ;
Au, Oscar C. ;
Kung, Man-Cheung .
IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (02) :79-89
[10]   Highly Parallel Rate-Distortion Optimized Intra-Mode Decision on Multicore Graphics Processors [J].
Cheung, Ngai-Man ;
Au, Oscar C. ;
Kung, Man-Cheung ;
Wong, Peter H. W. ;
Liu, Chun Hung .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2009, 19 (11) :1692-1703