Utilizing Multiple Xeon Phi Coprocessors on One Compute Node

被引:0
|
作者
Dong, Xinnan [1 ]
Chai, Jun [1 ]
Yang, Jing [1 ]
Wen, Mei [1 ]
Wu, Nan [1 ]
Cai, Xing [2 ,3 ]
Zhang, Chunyuan [1 ]
Chen, Zhaoyun [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci, Changsha 410073, Hunan, Peoples R China
[2] Simula Res Lab, NO-1325 Lyakser, Norway
[3] Univ Oslo, Dept Informat, NO-03166 Oslo, Norway
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Future exascale systems are expected to adopt compute nodes that incorporate many accelerators. This paper thus investigates the topic of programming multiple Xeon Phi coprocessors that lie inside one compute node. Besides a standard MPI-OpenMP programming approach, which belongs to the symmetric usage mode, two offload-mode programming approaches are considered. The first offload approach is conventional and uses compiler pragmas, whereas the second one is new and combines Intel's APIs of coprocessor offload infrastructure (COI) and symmetric communication interface (SCIF) for low-latency communication. While the pragma-based approach allows simpler programming, the COI-SCIF approach has three advantages in (1) lower overhead associated with launching offloaded code, (2) higher data transfer bandwidths, and (3) more advanced asynchrony between computation and data movement. The low-level COI-SCIF approach is also shown to have benefits over the MPI-OpenMP counterpart. All the programming approaches are tested by a real-world 3D application, for which the COI-SCIF approach shows a performance upper hand on a Tianhe-2 compute node with three Xeon Phi coprocessors.
引用
收藏
页码:68 / 81
页数:14
相关论文
共 50 条
  • [21] A Coprocessor Sharing-Aware Scheduler for Xeon Phi-based Compute Clusters
    Coviello, Giuseppe
    Cadambi, Srihari
    Chakradhar, Srimat
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [22] mD3DOCKxb: a deep parallel optimized software for molecular docking with Intel Xeon Phi Coprocessors
    Cheng, Qian
    Peng, Shaoliang
    Lu, Yutong
    Wu, Chengkun
    Wang, Haiqiang
    Liu, Xin
    Zhu, Weiliang
    Xu, Zhijian
    Zhang, Xinben
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 725 - 728
  • [23] mAMBER:Accelerating explicit solvent molecular dynamic with Intel Xeon Phi Many-Integrated Core Coprocessors
    Liu, Xin
    Peng, Shaoliang
    Yang, Canqun
    Wu, Chengkun
    Wang, Haiqiang
    Cheng, Qian
    Zhu, Weiliang
    Wang, Jinan
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 729 - 732
  • [24] Tera-Scale 1D FFT with Low-Communication Algorithm and Intel® Xeon Phi™ Coprocessors
    Park, Jongsoo
    Bikshandi, Ganesh
    Vaidyanathan, Karthikeyan
    Tang, Ping Tak Peter
    Dubey, Pradeep
    Kim, Daehyun
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [25] High-level support for hybrid parallel execution of C plus plus applications targeting Intel® Xeon Phi™ coprocessors
    Dokulil, Jiri
    Bajrovic, Enes
    Benkner, Siegfried
    Pllana, Sabri
    Sandrieser, Martin
    Bachmayer, Beverly
    2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 2508 - 2511
  • [26] Load Balancing and Patch-Based Parallel Adaptive Mesh Refinement for Tsunami Simulation on Heterogeneous Platforms Using Xeon Phi Coprocessors
    Ferreira, Chaulio R.
    Bader, Michael
    PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE (PASC17), 2017,
  • [27] A Unified Programming Model for Intra- and Inter-Node Offloading on Xeon Phi Clusters
    Noack, Matthias
    Wende, Florian
    Steinke, Thomas
    Cordes, Frank
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 203 - 214
  • [28] Accelerating multiple replica molecular dynamics simulations using the Intel® Xeon Phi coprocessor
    Parks, Conor
    Huang, Lei
    Wang, Yang
    Ramkrishna, Doraiswami
    MOLECULAR SIMULATION, 2017, 43 (09) : 714 - 723
  • [29] Design and Implementation of the Linpack Benchmark for Single and Multi-Node Systems Based on Intel® Xeon Phi™ Coprocessor
    Heinecke, Alexander
    Vaidyanathan, Karthikeyan
    Smelyanskiy, Mikhail
    Kobotov, Alexander
    Dubtsov, Roman
    Henry, Greg
    Shet, Aniruddha G.
    Chrysos, George
    Dubey, Pradeep
    IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 126 - 137
  • [30] Performance Study of Monte Carlo Codes on Xeon Phi Coprocessors - Testing MCNP 6.1 and Profiling ARCHER Geometry Module on the FS7ONNi Problem
    Liu, Tianyu
    Wolfe, Noah
    Lin, Hui
    Zieb, Kris
    Ji, Wei
    Caracappa, Peter
    Carother, Christopher
    Xu, X. George
    ICRS-13 & RPSD-2016, 13TH INTERNATIONAL CONFERENCE ON RADIATION SHIELDING & 19TH TOPICAL MEETING OF THE RADIATION PROTECTION AND SHIELDING DIVISION OF THE AMERICAN NUCLEAR SOCIETY - 2016, 2017, 153