Strategies to parallelize a finite element mesh truncation technique on multi-core and many-core architectures

被引：1

作者：

Badia, Jose M. ^{[1
]}

Amor-Martin, Adrian ^{[2
]}

Belloch, Jose A. ^{[3
]}

Garcia-Castillo, Luis Emilio ^{[3
]}

机构：

[1] Univ Jaume I Castellon, Dept Ingn & Ciencia Comp, Avda Sos Baynat s-n, Castellon de La Plana 12071, Spain

[2] Univ Carlos III Madrid, Dept Teoria Senal & Comunicac, Avda Univ 30, Madrid 28911, Spain

[3] Univ Carlos III Madrid, Dept Tecnol Elect, Avda Univ 30, Madrid 28911, Spain

来源：

JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 07期

关键词：

Parallel computing; CUDA; OpenMP; Finite elements; GPU; RADIATION; ACCELERATION; SCATTERING;

D O I：

10.1007/s11227-022-04975-6

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Achieving maximum parallel performance on multi-core CPUs and many-core GPUs is a challenging task depending on multiple factors. These include, for example, the number and granularity of the computations or the use of the memories of the devices. In this paper, we assess those factors by evaluating and comparing different parallelizations of the same problem on a multiprocessor containing a CPU with 40 cores and four P100 GPUs with Pascal architecture. We use, as study case, the convolutional operation behind a non-standard finite element mesh truncation technique in the context of open region electromagnetic wave propagation problems. A total of six parallel algorithms implemented using OpenMP and CUDA have been used to carry out the comparison by leveraging the same levels of parallelism on both types of platforms. Three of the algorithms are presented for the first time in this paper, including a multi-GPU method, and two others are improved versions of algorithms previously developed by some of the authors. This paper presents a thorough experimental evaluation of the parallel algorithms on a radar cross-sectional prediction problem. Results show that performance obtained on the GPU clearly overcomes those obtained in the CPU, much more so if we use multiple GPUs to distribute both data and computations. Accelerations close to 30 have been obtained on the CPU, while with the multi-GPU version accelerations larger than 250 have been achieved.

引用

页码：7648 / 7664

页数：17

共 19 条

[1] GPU Acceleration of a Non-Standard Finite Element Mesh Truncation Technique for Electromagnetics [J].

Badia, Jose M. ;

Amor-Martin, Adrian ;

Belloch, Jose A. ;

Emilio Garcia-Castillo, Luis .

IEEE ACCESS, 2020, 8 :94719-94730

[2] On the use of many-core machines for the acceleration of a mesh truncation technique for FEM [J].

Belloch, Jose A. ;

Amor-Martin, Adrian ;

Garcia-Donoro, Daniel ;

Martinez-Zaldivar, Francisco J. ;

Garcia-Castillo, Luis E. .

JOURNAL OF SUPERCOMPUTING, 2019, 75 (03) :1686-1696

[3] Fully coupled hybrid FEM-UTD method using NURBS for the analysis of radiation problems [J].

Fernandez-Recio, Raul ;

Garcia-Castillo, Luis-Emilio ;

Gomez-Revuelto, Ignacio ;

Salazar-Palma, Magdalena .

IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2008, 56 (03) :774-783

[4] A finite element method for the analysis of radiation and scattering of electromagnetic waves on complex environments [J].

García-Castillo, LE ;

Gómez-Revuelto, I ;

de Adana, FS ;

Salazar-Palma, M .

COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2005, 194 (2-5) :637-655

[5] Higher-Order Finite Element Electromagnetics Code for HPC environments [J].

Garcia-Donoro, Daniel ;

Amor-Martin, Adrian ;

Garcia-Castillo, Luis E. .

INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 :818-827

[6] Fully coupled hybrid-method FEM/high-frequency technique for the analysis of 3D scattering and radiation problems [J].

Gómez-Revuelto, I ;

García-Castillo, LE ;

Salazar-Palma, M ;

Sarkar, TK .

MICROWAVE AND OPTICAL TECHNOLOGY LETTERS, 2005, 47 (02) :104-107

[7]

Jin J.-M., 2009, Finite element analysis of antennas and arrays

[8] Performance Analysis of GPU-based Convolutional Neural Networks [J].

Li, Xiaqing ;

Zhang, Guangyan ;

Huang, H. Howie ;

Wang, Zhufan ;

Zheng, Weimin .

PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, :67-76

[9]

Mairal J, 2014, ARXIV

[10]

Melendo A, 2021, GID SOFTWARE

← 1 2 →