Accelerating the RTTOV-7 IASI and AMSU-A radiative transfer models on graphics processing units: evaluating central processing unit/graphics processing unit-hybrid and pure-graphics processing unit approaches

被引:6
|
作者
Mielikainen, Jarno [1 ]
Huang, Bormin [1 ]
Huang, Hung-Lung Allen [1 ]
Saunders, Roger [2 ]
机构
[1] Univ Wisconsin, Space Sci & Engn Ctr, Cooperat Inst Meteorol Satellite Studies, Madison, WI 53706 USA
[2] Met Off, Exeter EX1 3PB, Devon, England
来源
JOURNAL OF APPLIED REMOTE SENSING | 2011年 / 5卷
关键词
radiative transfer model; RTTOV; IASI; AMSU-A; GPU; CUDA;
D O I
10.1117/1.3658028
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The radiative transfer for television operational vertical sounder (RTTOV) is a widely-used radiative transfer model (RTM) for calculation of radiances for satellite infrared and microwave sensors, including the 8461-channel infrared atmospheric sounding interferometer (IASI) and the 15-band Advanced Microwave Sounding Unit-A (AMSU-A). In the era of hyperspectral sounders with thousands of spectral channels, the computation of the RTM becomes more time-consuming. The RTM performance in operational numerical weather prediction systems still limits the number of used channels in hyperspectral sounders to only a few hundred. To take full advantage of such high-resolution infrared observations, a computationally efficient radiative transfer model is needed to facilitate satellite data assimilation. In this paper, we develop the parallel implementation of the RTTOV-7 IASI and AMSU-A RTMs to run the predictor module on CPUs in pipeline with the transmittance and radiance modules on NVIDIA many-core graphics processing units (GPUs). We show that concurrent execution of RTTOV-7 IASI RTM on CPU and GPU, in addition to asynchronous data transfer from CPU to GPU, allows the GPU accelerated code running on the 240-core NVIDIA Tesla C1060 to reach a speedup of 461x and 1793x for 1- and 4-GPU configurations, respectively. To compute one day's amount of 1,296,000 IASI spectra, the CPU code running on the host AMD Phenom II X4 940 CPU core with 3.0 GHz will take 2.8 days. Thus, GPU acceleration reduced running time to 8.75 and 2.25 min on 1- and 4-GPU configurations, respectively. Speedup for the RTTOV AMSU-A RTM varied from 29x to 75x for 1 and 4 GPUs, respectively. To further boost the speedup of a multispectral RTM, we developed a novel pure-GPU version of the RTTOV AMSU-A RTM where the predictor module also runs on GPUs to achieve a 96% reduction in the host-to-device data transfer. The speedups for the pure-GPU AMSU-A RTM are significantly increased to 56x and 125x for 1- and 4-GPU configurations, respectively. C (C) 2011 Society of Photo-Optical Instrumentation Engineers (SPIE).
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Energy Consumption Powered by Graphics Processing Units (GPU) in Response to the Number of Operating Computing Unit
    Huzmiev, I. K.
    Chipirov, Z. Ah
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING, APPLICATIONS AND MANUFACTURING (ICIEAM), 2016,
  • [32] Accelerating Rabin Karp on a Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA)
    Dayarathne, Nayomi
    Ragel, Roshan
    2014 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS), 2014,
  • [33] Modified Anderson Method for Accelerating 3D-RISM Calculations Using Graphics Processing Unit
    Maruyama, Yutaka
    Hirata, Fumio
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2012, 8 (09) : 3015 - 3021
  • [34] Accelerating Monte Carlo Simulation for Radiotherapy Dose Calculation using a Massively Parallel Graphics Processing Unit
    Zhuge, Y.
    Xie, H.
    Miller, R. W.
    INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2010, 78 (03): : S804 - S805
  • [35] 3D medical image hybrid visualization system based on graphics processing unit
    Laboratory of Image Science and Technology, Southeast University, Nanjing 210096, China
    不详
    Shu Ju Cai Ji Yu Chu Li, 2006, 4 (428-433):
  • [36] Comparison of parallel central processing unit- and graphics processing unit-based implementations of greedy string tiling algorithm for source code plagiarism detection
    Misic, Marko J.
    Tomasevic, Milo, V
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (21):
  • [37] Comparison of parallel central processing unit- and graphics processing unit-based implementations of greedy string tiling algorithm for source code plagiarism detection
    Mišić, Marko J.
    Tomašević, Milo V.
    Concurrency and Computation: Practice and Experience, 2022, 34 (21)
  • [38] Acceleration of synthetic aperture radar imaging via subaperture chirp-scaling approach based on heterogeneous graphics-processing-unit-central-processing-unit architecture
    Liu, Yabo
    Li, Hongyu
    Wu, Zheng
    Deng, Yunkai
    Wang, Robert
    JOURNAL OF APPLIED REMOTE SENSING, 2015, 9
  • [39] Accelerating Envelope Analysis-Based Fault Diagnosis Using a General-Purpose Graphics Processing Unit
    Tra, Viet
    Uddin, Sharif
    Kim, Jaeyoung
    Kim, Cheol-Hong
    Kim, Jongmyon
    INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, IUKM 2016, 2016, 9978 : 409 - 420
  • [40] Accelerating Reconstruction of Reflective Fourier Ptychographic Microscopy by Employing a Global Optimal Search Algorithm in a Graphics Processing Unit
    Pham, V. H.
    Chon, B. H.
    Ahn, H. K.
    IEEE PHOTONICS JOURNAL, 2022, 14 (04):