Parallelization of large vector similarity computations in a hybrid CPU plus GPU environment

被引:11
作者
Czarnul, Pawe [1 ]
机构
[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Dept Comp Architecture, Gdansk, Poland
关键词
Hybrid parallelism; OpenMP; CUDA; Parallel programming; Optimization;
D O I
10.1007/s11227-017-2159-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The paper presents design, implementation and tuning of a hybrid parallel OpenMP+CUDA code for computation of similarity between pairs of a large number of multidimensional vectors. The problem has a wide range of applications, and consequently its optimization is of high importance, especially on currently widespread hybrid CPU+GPU systems targeted in the paper. The following are presented and tested for computation of all vector pairs: tuning of a GPU kernel with consideration of memory coalescing and using shared memory, minimization of GPU memory allocation costs, optimization of CPU-GPU communication in terms of size of data sent, overlapping CPU-GPU communication and kernel execution, concurrent kernel execution, determination of best sizes for data batches processed on CPUs and GPUs along with best GPU grid sizes. It is shown that all codes scale in hybrid environments with various relative performances of compute devices, even for a case when comparisons of various vector pairs take various amounts of time. Tests were performed on two high-performance hybrid systems with: 2 x Intel Xeon E5-2640 CPU + 2 x NVIDIA Tesla K20m and latest generation 2 x Intel Xeon CPU E5-2620 v4 + NVIDIA's Pascal generation GTX 1070 cards. Results demonstrate expected improvements and beneficial optimizations important for users incorporating such types of computations into their parallel codes run on similar systems.
引用
收藏
页码:768 / 786
页数:19
相关论文
共 27 条
[1]  
Alabduljalil M. A., 2013, P 6 ACM INT C WEB SE, P203, DOI [10.1145/2433396.2433422, DOI 10.1145/2433396.2433422]
[2]  
Amodei D, 2016, PR MACH LEARN RES, V48
[3]  
[Anonymous], 2007, WWW INT C WORLD WID, DOI [10.1145/1242572.1242591, DOI 10.1145/1242572.1242591]
[4]  
Awekar A, 2009, 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, P295
[5]   Benchmarking Performance of a Hybrid Intel Xeon/Xeon Phi System for Parallel Computation of Similarity Measures Between Large Vectors [J].
Czarnul, Pawel .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (05) :1091-1107
[6]   Modeling energy consumption of parallel applications [J].
Czarnul, Pawel ;
Kuchta, Jaroslaw ;
Rosciszewski, Pawel ;
Proficz, Jerzy .
PROCEEDINGS OF THE 2016 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2016, 8 :855-864
[7]  
Czarnul P, 2015, 2015 IEEE 2ND INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), P472, DOI 10.1109/CYBConf.2015.7175980
[8]  
Czarnul P, 2014, LECT NOTES COMPUT SC, V8314, P66, DOI 10.1007/978-3-642-45249-9_5
[9]  
De Francisci G, 2010, INFORMATION RETRIEVA, P27
[10]   GPU Acceleration of Document Similarity Measures for Automated Bug Triaging [J].
Dunn, Tim ;
Banerjee, Natasha Kholgade ;
Banerjee, Sean .
2016 IEEE 27TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW), 2016, :140-145