High-performance cone beam reconstruction using CUDA compatible GPUs

被引:51
作者
Okitsu, Yusuke [1 ]
Ino, Fumihiko [1 ]
Hagihara, Kenichi [1 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Suita, Osaka 5650871, Japan
关键词
Cone beam reconstruction; Acceleration; GPU; CUDA; TECHNIQUE SART;
D O I
10.1016/j.parco.2010.01.004
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (CPU). This paper presents an acceleration method for cone beam reconstruction using CUDA compatible GPUs. The proposed method accelerates the Feldkamp, Davis, and Kress (FDK) algorithm using three techniques: (1) off-chip memory access reduction for saving the memory bandwidth; (2) loop unrolling for hiding the memory latency; and (3) multithreading for exploiting multiple GPUs. We describe how these techniques can be incorporated into the reconstruction code. We also show an analytical model to understand the reconstruction performance on multi-GPU environments. Experimental results show that the proposed method runs at 83% of the theoretical memory bandwidth, achieving a throughput of 64.3 projections per second (pps) for reconstruction of 512(3)-voxel volume from 360 512(2)-pixel projections. This performance is 41% higher than the previous CUDA-based method and is 24 times faster than a CPU-based method optimized by vector intrinsics. Some detailed analyses are also presented to understand how effectively the acceleration techniques increase the reconstruction performance of a naive method. We also demonstrate out-of-core reconstruction for large-scale datasets, up to 1024(3)-voxel volume. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:129 / 141
页数:13
相关论文
共 33 条
[11]   A data distributed parallel algorithm for nonrigid image registration [J].
Ino, F ;
Ooyama, K ;
Hagihara, K .
PARALLEL COMPUTING, 2005, 31 (01) :19-43
[12]  
Ino F., 2009, P SPIE MED IM MI 200
[13]   Hyperfast parallel-beam and cone-beam backprojection using the cell general purpose hardware [J].
Kachelriess, Marc ;
Knaup, Michael ;
Bockenbach, Olivier .
MEDICAL PHYSICS, 2007, 34 (04) :1474-1486
[14]   High-performance computing service over the Internet for intraoperative image processing [J].
Kawasaki, Y ;
Ino, F ;
Mizutani, Y ;
Fujimoto, N ;
Sasama, T ;
Sato, Y ;
Sugano, N ;
Tamura, S ;
Hagihara, K .
IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, 2004, 8 (01) :36-46
[15]   How GPUs work [J].
Luebke, David ;
Humphreys, Greg .
COMPUTER, 2007, 40 (02) :96-100
[16]   Cg: A system for programming graphics hardware in a C-like language [J].
Mark, WR ;
Glanville, RS ;
Akeley, K ;
Kilgard, MJ .
ACM TRANSACTIONS ON GRAPHICS, 2003, 22 (03) :896-907
[17]   Rapid 3-D cone-beam reconstruction with the simultaneous algebraic reconstruction technique (SART) using 2-D texture mapping hardware [J].
Mueller, K ;
Yagel, R .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2000, 19 (12) :1227-1237
[18]  
NOEL PB, 2008, P HIGH PERF MED IM C
[19]  
NVIDIA, 2007, CUDA PROGR GUID VERS
[20]  
Okitsu Y, 2008, LECT NOTES COMPUT SC, V5374, P108, DOI 10.1007/978-3-540-89894-8_13