Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs

被引:18
|
作者
Franco, Joaquin [1 ]
Bernabe, Gregorio [1 ]
Fernandez, Juan [1 ]
Ujaldon, Manuel [2 ]
机构
[1] Univ Murcia, Dept Comp Engn, E-30001 Murcia, Spain
[2] Univ Murcia, Comp Architect Dept, E-30001 Murcia, Spain
关键词
3D Fast Wavelet Transform; parallel programming; GPU; multicore;
D O I
10.1016/j.procs.2010.04.122
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
GPUs have recently attracted our attention as accelerators on a wide variety of algorithms, including assorted examples within the image analysis field. Among them, wavelets are gaining popularity as solid tools for data mining and video compression, though this comes at the expense of a high computational cost. After proving the effectiveness of the GPU for accelerating the 2D Fast Wavelet Transform [1], we present in this paper a novel implementation on manycore GPUs and multicore CPUs for a high performance computation of the 3D Fast Wavelet Transform (3D-FWT). This algorithm poses a challenging access pattern on matrix operators demanding high sustainable bandwidth, as well as mathematical functions with remarkable arithmetic intensity on ALUs and FPUs. On the GPU side, we focus on CUDA programming to develop methods for an efficient mapping on manycores and to fully exploit the memory hierarchy, whose management is explicit by the programmer. On multicore CPUs, OpenMP and Pthreads are used as counterparts to maximize parallelism, and renowned techniques like tiling and blocking are exploited to optimize the use of memory. Experimental results on an Nvidia Tesla C870 GPU and an Intel Core 2 Quad Q6700 CPU indicate that our implementation runs three times faster on the Tesla and up to fifteen times faster when communications are neglected, which enables the GPU for processing real-time videos in many applications where the 3D-FWT is involved. (C) 2010 Published by Elsevier Ltd.
引用
收藏
页码:1095 / 1104
页数:10
相关论文
共 50 条
  • [1] Optimization techniques for 3D-FWT on systems with manycore GPUs and multicore CPUs
    Bernabe, Gregorio
    Cuenca, Javier
    Gimenez, Domingo
    2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 319 - 328
  • [2] Optimizing a 3D-FWT code in a heterogeneous cluster of multicore CPUs and manycore GPUs
    Bernabe, Gregorio
    Cuenca, Javier
    Gimenez, Domingo
    2013 25TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD), 2013, : 97 - 104
  • [4] Improving an autotuning engine for 3D Fast Wavelet Transform on manycore systems
    Bernabe, Gregorio
    Cuenca, Javier
    Pedro Garcia, Luis
    Gimenez, Domingo
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (02): : 830 - 844
  • [5] Improving an autotuning engine for 3D Fast Wavelet Transform on manycore systems
    Gregorio Bernabé
    Javier Cuenca
    Luis Pedro García
    Domingo Giménez
    The Journal of Supercomputing, 2014, 70 : 830 - 844
  • [6] Heterogeneous parallel 3D image deconvolution on a cluster of GPUs and CPUs
    Domanski, L.
    Bednarz, T.
    Vallotton, P.
    Taylor, J.
    19TH INTERNATIONAL CONGRESS ON MODELLING AND SIMULATION (MODSIM2011), 2011, : 613 - 619
  • [7] Optimizing 3D Convolutions for Wavelet Transforms on CPUs with SSE Units and GPUs
    Videau, Brice
    Marangozova-Martin, Vania
    Genovese, Luigi
    Deutsch, Thierry
    EURO-PAR 2013 PARALLEL PROCESSING, 2013, 8097 : 826 - 837
  • [8] Fast 3D wavelet transform on multicore and many-core computing platforms
    Galiano, V.
    Lopez-Granado, O.
    Malumbres, M. P.
    Migallon, H.
    JOURNAL OF SUPERCOMPUTING, 2013, 65 (02): : 848 - 865
  • [9] Fast 3D wavelet transform on multicore and many-core computing platforms
    V. Galiano
    O. López-Granado
    M. P. Malumbres
    H. Migallón
    The Journal of Supercomputing, 2013, 65 : 848 - 865
  • [10] Massively parallel regularized 3D inversion of potential fields on CPUs and GPUs
    Cuma, Martin
    Zhdanov, Michael S.
    COMPUTERS & GEOSCIENCES, 2014, 62 : 80 - 87