An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs

被引:6
作者
Yu, Xiaodong [1 ]
Wang, Hao [1 ]
Feng, Wu-chun [1 ]
Gong, Hao [2 ]
Cao, Guohua [2 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA
[2] Virginia Tech, Dept Biomed Engr & Mech, Blacksburg, VA 24060 USA
来源
ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017 | 2017年
关键词
GPU; Computed Tomography; Image Reconstruction; Algebraic Reconstruction Technique; Sparse Matrix-Vector Multiplication; SpMV; Transposition; SCHEME; ART;
D O I
10.1145/3075564.3078889
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The algebraic reconstruction technique (ART) is an iterative algorithm for CT (i.e., computed tomography) image reconstruction that delivers better image quality with less radiation dosage than the industry-standard filtered back projection (FBP). However, the high computational cost of ART requires researchers to turn to high-performance computing to accelerate the algorithm. Alas, existing approaches for ART suffer from inefficient design of compressed data structures and computational kernels on GPUs. Thus, this paper presents our enhanced CUDA-based CT image reconstruction tool based on the algebraic reconstruction technique (ART) or cuART. It delivers a compression and parallelization solution for ART-based image reconstruction on GPUs. We address the under-performing, but popular, GPU libraries, e.g., cuSPARSE, BRC, and CSR5, on the ART algorithm and propose a symmetrybased CSR format (SCSR) to further compress the CSR data structure and optimize data access for both SpMV and SpMV_T via a column-indices permutation. We also propose sorting-based and sorting-free blocking techniques to optimize the kernel computation by leveraging the sparsity patterns of the system matrix. The end result is that cuART can reduce the memory footprint significantly and enable practical CT datasets to fit into a single GPU. The experimental results on a NVIDIA Tesla K80 GPU illustrate that our approach can achieve up to 6.8x, 7.2x, and 5.4x speedups over counterparts that use cuSPARSE, BRC, and CSR5, respectively.
引用
收藏
页码:97 / 106
页数:10
相关论文
共 39 条
  • [1] Andersen AH., 1984, IMG, V01, P81
  • [2] [Anonymous], 2007, IMV 2006 CT MARK SUM
  • [3] [Anonymous], 2013, THESIS
  • [4] [Anonymous], 2013, ADV METEOROL, DOI DOI 10.14778/2536206.2536210
  • [5] An Efficient Two-Dimensional Blocking Strategy for Sparse Matrix-Vector Multiplication on GPUs
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Sadayappan, P.
    [J]. PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, : 273 - 282
  • [6] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications
    Ashari, Arash
    Sedaghati, Naser
    Eisenlohr, John
    Parthasarathy, Srinivasan
    Sadayappan, P.
    [J]. SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 781 - 792
  • [7] Bendre M, 2015, PROC VLDB ENDOW, V8, P2001
  • [8] Buluç A, 2009, SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P233
  • [9] Directive-Based Pipelining Extension for OpenMP
    Cui, Xuewen
    Scogland, Thomas R. W.
    de Supinski, Bronis R.
    Feng, Wu-chun
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 481 - 484
  • [10] GILBERT P, 1972, J THEOR BIOL, V36, P105, DOI 10.1016/0022-5193(72)90180-4