Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs

被引:18
作者
Franco, Joaquin [1 ]
Bernabe, Gregorio [1 ]
Fernandez, Juan [1 ]
Ujaldon, Manuel [2 ]
机构
[1] Univ Murcia, Dept Comp Engn, E-30001 Murcia, Spain
[2] Univ Murcia, Comp Architect Dept, E-30001 Murcia, Spain
来源
ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS | 2010年 / 1卷 / 01期
关键词
3D Fast Wavelet Transform; parallel programming; GPU; multicore;
D O I
10.1016/j.procs.2010.04.122
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPUs have recently attracted our attention as accelerators on a wide variety of algorithms, including assorted examples within the image analysis field. Among them, wavelets are gaining popularity as solid tools for data mining and video compression, though this comes at the expense of a high computational cost. After proving the effectiveness of the GPU for accelerating the 2D Fast Wavelet Transform [1], we present in this paper a novel implementation on manycore GPUs and multicore CPUs for a high performance computation of the 3D Fast Wavelet Transform (3D-FWT). This algorithm poses a challenging access pattern on matrix operators demanding high sustainable bandwidth, as well as mathematical functions with remarkable arithmetic intensity on ALUs and FPUs. On the GPU side, we focus on CUDA programming to develop methods for an efficient mapping on manycores and to fully exploit the memory hierarchy, whose management is explicit by the programmer. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. Experimental results on an Nvidia Tesla C870 GPU and an Intel Core 2 Quad Q6700 CPU indicate that our implementation runs three times faster on the Tesla and up to fifteen times faster when communications are neglected, which enables the GPU for processing real-time videos in many applications where the 3D-FWT is involved. (C) 2010 Published by Elsevier Ltd.
引用
收藏
页码:1095 / 1104
页数:10
相关论文
共 17 条
  • [1] *AMD, 2009, AMD STREAM COMP
  • [2] [Anonymous], 1992, CBMS-NSF Reg. Conf. Ser. in Appl. Math
  • [3] A new lossy 3-D wavelet transform for high-quality compression of medical video
    Bernabé, G
    González, J
    García, JM
    Duato, J
    [J]. 2000 IEEE EMBS INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY APPLICATIONS IN BIOMEDICINE, PROCEEDINGS, 2000, : 226 - 231
  • [4] FRANCO J, J REAL TIME IM UNPUB
  • [5] *GCC, 2009, GCC GNU COMP COLL
  • [6] ICC, 2009, INT SOFTW NETW
  • [7] A THEORY FOR MULTIRESOLUTION SIGNAL DECOMPOSITION - THE WAVELET REPRESENTATION
    MALLAT, SG
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1989, 11 (07) : 674 - 693
  • [8] Cache issues with JPEG-2000 wavelet lifting
    Meerwald, P
    Norcen, R
    Uhl, A
    [J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2002, PTS 1 AND 2, 2002, 4671 : 626 - 634
  • [9] *NVID, 2009, CUDA ZON MAINT NVID
  • [10] *NVID, 2009, TESL GPU COMP SOL