Performance analysis of the FDTD method applied to holographic volume gratings: Multi-core CPU versus GPU computing

被引:17
作者
Frances, J. [1 ]
Bleda, S. [1 ,2 ]
Neipp, C. [1 ,2 ]
Marquez, A. [1 ,2 ]
Pascual, I. [2 ,3 ]
Belendez, A. [1 ,2 ]
机构
[1] Univ Alicante, Dept Phys Syst Engn & Signal Theory, E-03080 Alicante, Spain
[2] Univ Alicante, Univ Inst Phys Sci & Technol, E-03080 Alicante, Spain
[3] Univ Alicante, Dept Opt Pharmacol & Anat, E-03080 Alicante, Spain
关键词
GPU Computing; CUDA; Holography; Gratings; OpenMP; SEE; SIMD; Speed up; PERFECTLY MATCHED LAYER; TIME-DOMAIN METHOD; WAVE-PROPAGATION; ELECTROMAGNETIC-WAVES; MAXWELLS EQUATIONS; MEDIA; BOUNDARY; IMPLEMENTATION; ABSORPTION; LIGHT;
D O I
10.1016/j.cpc.2012.09.025
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:469 / 479
页数:11
相关论文
共 48 条
[1]  
Adams S., P DOD HIGH PERF COMP, P334
[2]  
[Anonymous], 2020, CUDA C++ Programming Guide
[3]  
[Anonymous], 1989, Advanced Engineering Electromagnetics
[4]  
[Anonymous], 2016, Programming massively parallel processors: a hands-on approach
[5]  
[Anonymous], 2011, CUDA by Example: An Introduction to General-Purpose GPU Programming
[6]  
[Anonymous], 2000, Holographic Data Storage
[7]  
[Anonymous], MICR S 2004 IEEE MTT
[8]  
[Anonymous], 2010, FERM COMP GUID CUDA
[9]   Three-dimensional perfectly matched layer for the absorption of electromagnetic waves [J].
Berenger, JP .
JOURNAL OF COMPUTATIONAL PHYSICS, 1996, 127 (02) :363-379
[10]   A PERFECTLY MATCHED LAYER FOR THE ABSORPTION OF ELECTROMAGNETIC-WAVES [J].
BERENGER, JP .
JOURNAL OF COMPUTATIONAL PHYSICS, 1994, 114 (02) :185-200