An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs

被引：6

作者：

Yu, Xiaodong ^{[1
]}

Wang, Hao ^{[1
]}

Feng, Wu-chun ^{[1
]}

Gong, Hao ^{[2
]}

Cao, Guohua ^{[2
]}

机构：

[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA

[2] Virginia Tech, Dept Biomed Engr & Mech, Blacksburg, VA 24060 USA

来源：

ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017 | 2017年

关键词：

GPU; Computed Tomography; Image Reconstruction; Algebraic Reconstruction Technique; Sparse Matrix-Vector Multiplication; SpMV; Transposition; SCHEME; ART;

D O I：

10.1145/3075564.3078889

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The algebraic reconstruction technique (ART) is an iterative algorithm for CT (i.e., computed tomography) image reconstruction that delivers better image quality with less radiation dosage than the industry-standard filtered back projection (FBP). However, the high computational cost of ART requires researchers to turn to high-performance computing to accelerate the algorithm. Alas, existing approaches for ART suffer from inefficient design of compressed data structures and computational kernels on GPUs. Thus, this paper presents our enhanced CUDA-based CT image reconstruction tool based on the algebraic reconstruction technique (ART) or cuART. It delivers a compression and parallelization solution for ART-based image reconstruction on GPUs. We address the under-performing, but popular, GPU libraries, e.g., cuSPARSE, BRC, and CSR5, on the ART algorithm and propose a symmetrybased CSR format (SCSR) to further compress the CSR data structure and optimize data access for both SpMV and SpMV_T via a column-indices permutation. We also propose sorting-based and sorting-free blocking techniques to optimize the kernel computation by leveraging the sparsity patterns of the system matrix. The end result is that cuART can reduce the memory footprint significantly and enable practical CT datasets to fit into a single GPU. The experimental results on a NVIDIA Tesla K80 GPU illustrate that our approach can achieve up to 6.8x, 7.2x, and 5.4x speedups over counterparts that use cuSPARSE, BRC, and CSR5, respectively.

引用

页码：97 / 106

页数：10

共 39 条

[1]

Andersen AH., 1984, IMG, V01, P81

[2]

[Anonymous], 2007, IMV 2006 CT MARK SUM

[3]

[Anonymous], 2013, THESIS

[4]

[Anonymous], 2013, ADV METEOROL, DOI DOI 10.14778/2536206.2536210

[5] An Efficient Two-Dimensional Blocking Strategy for Sparse Matrix-Vector Multiplication on GPUs [J].

Ashari, Arash ;

Sedaghati, Naser ;

Eisenlohr, John ;

Sadayappan, P. .

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14), 2014, :273-282

[6] Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications [J].

Ashari, Arash ;

Sedaghati, Naser ;

Eisenlohr, John ;

Parthasarathy, Srinivasan ;

Sadayappan, P. .

SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, :781-792

[7]

Bendre M, 2015, PROC VLDB ENDOW, V8, P2001

[8]

Buluç A, 2009, SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, P233

[9] Directive-Based Pipelining Extension for OpenMP [J].

Cui, Xuewen ;

Scogland, Thomas R. W. ;

de Supinski, Bronis R. ;

Feng, Wu-chun .

2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, :481-484

[10] ITERATIVE METHODS FOR 3-DIMENSIONAL RECONSTRUCTION OF AN OBJECT FROM PROJECTIONS [J].

GILBERT, P .

JOURNAL OF THEORETICAL BIOLOGY, 1972, 36 (01) :105-&

← 1 2 3 4 →