An Optimized Parallel IDCT on Graphics Processing Units

被引:0
作者
Wang, Biao [1 ]
Alvarez-Mesa, Mauricio [1 ]
Chi, Chi Ching [1 ]
Juurlink, Ben [1 ]
机构
[1] Tech Univ Berlin, Berlin, Germany
来源
EURO-PAR 2012: PARALLEL PROCESSING WORKSHOPS | 2013年 / 7640卷
关键词
IDCT; GPU; H.264; OpenCL; parallel programming;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations conducted on different GPUs show average speedups from 1.7x to 7.4x compared to an optimized single-threaded SIMD CPU version.
引用
收藏
页码:155 / 164
页数:10
相关论文
共 9 条
[1]  
CHEN WH, 1977, IEEE T COMMUN, V25, P1004, DOI 10.1109/TCOM.1977.1093941
[2]  
Fang B., 2005, P IEEE INT S CIRC SY
[3]  
Khronos OpenCL Working Group, 11 KHRON OPENCL WORK
[4]   Low-complexity transform and quantization in H.264/AVC [J].
Malvar, HS ;
Hallapuro, A ;
Karczewicz, M ;
Kerofsky, L .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2003, 13 (07) :598-603
[5]  
NVIDIA, 11 KHRON OPENCL WORK
[6]  
Obukhov A., 2008, Discrete cosine transform for 8x8 blocks with cuda
[7]   The H.264/AVC advanced video coding standard: Overview and introduction to the fidelity range extensions [J].
Sullivan, GJ ;
Topiwala, PN ;
Luthra, A .
APPLICATIONS OF DIGITAL IMAGE PROCESSING XXVII, PTS 1AND 2, 2004, 5558 :454-474
[8]   Overview of the H.264/AVC video coding standard [J].
Wiegand, T ;
Sullivan, GJ ;
Bjontegaard, G ;
Luthra, A .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2003, 13 (07) :560-576
[9]   FERMI GF100 GPU ARCHITECTURE [J].
Wittenbrink, Craig M. ;
Kilgariff, Emmett ;
Prabhu, Arjun .
IEEE MICRO, 2011, 31 (02) :50-59