Performance Evaluation of NAS Parallel Benchmarks on Intel® Xeon Phi™

被引:20
作者
Ramachandran, Arunmoezhi [1 ]
Vienne, Jerome [2 ]
Van der Wijngaart, Rob [3 ]
Koesterke, Lars [2 ]
Sharapov, Ilya [3 ]
机构
[1] Univ Texas Dallas, Richardson, TX 75083 USA
[2] Univ Texas Austin, Texas Adv Comp Ctr, Austin, TX USA
[3] Intel Corp, Santa Clara, CA USA
来源
2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP) | 2013年
基金
美国国家科学基金会;
关键词
Parallel programming; Multicore processing; Performance analysis;
D O I
10.1109/ICPP.2013.87
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
NAS parallel benchmarks (NPB) are a set of applications commonly used to evaluate parallel systems. We use the NPB-OpenMP version to examine the performance of the Intel's new Xeon Phi co-processor and focus specially on the many-core aspect of the Xeon Phi architecture. A first analysis studies the scalability up to 244 threads on 61 cores, the impact of affinity settings on scaling and compare performance characteristics of Xeon Phi and traditional Xeon CPUs. The application of several well-established optimization techniques allows us to identify common bottlenecks that can specifically impede performance on the Xeon Phi but are not as severe on multi-core CPUs. We also find that many of the OpenMP-parallel loops are too short (in terms of the number of loop iterations) for a balanced execution by 244 threads. New, or redesigned benchmarks will be needed to accommodate the greatly increased number of cores and threads. At the end, we summarize our findings in a set recommendations for performance optimization for Xeon Phi.
引用
收藏
页码:736 / 743
页数:8
相关论文
共 29 条
  • [21] A Case Study in Coordination Programming: Performance Evaluation of S-Net vs Intel's Concurrent Collections
    Zaichenkov, Pavel
    Gijsbers, Bert
    Grelck, Clemens
    Tveretina, Olga
    Shafarenko, Alex
    [J]. PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1060 - 1068
  • [22] Performance Evaluation of Massively Parallel Systems Using SPEC OMP Suite
    Mustafa, Dheya
    [J]. COMPUTERS, 2022, 11 (05)
  • [23] A TRACE BASED PERFORMANCE EVALUATION TOOL FOR PARALLEL REAL-TIME SYSTEMS
    BORGEEST, R
    DIMKE, B
    HANSEN, O
    [J]. PARALLEL COMPUTING, 1995, 21 (04) : 551 - 564
  • [24] Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
    Jarzabek, Lukasz
    Czarnul, Pawel
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (12) : 5378 - 5401
  • [25] Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
    Łukasz Jarząbek
    Paweł Czarnul
    [J]. The Journal of Supercomputing, 2017, 73 : 5378 - 5401
  • [26] Parallel Implementation and Performance Evaluation of Facial Recognition Algorithms Using Open Source Technologies
    Suryaprasad, J.
    Sandesh, D. S.
    Priyanka, I
    Pravalika, G. N.
    Kumar, Aman
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 177 - 182
  • [27] Evaluation of the Performance of Tightly Coupled Parallel Solvers and MPI Communications in IaaS From the Public Cloud
    Fernandez, Arturo
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (04) : 2613 - 2622
  • [28] A Novel Dynamic Performance Analysis and Evaluation Model of Series-Parallel Connected Battery Pack for Electric Vehicles
    Ye, Min
    Song, Xun
    Xiong, Rui
    Sun, Fengchun
    [J]. IEEE ACCESS, 2019, 7 : 14256 - 14265
  • [29] Performance Evaluation of Python']Python Parallel Programming Models: Charm4Py and mpi4py
    Fink, Zane
    Liu, Simeng
    Choi, Jaemin
    Diener, Matthias
    Kale, Laxmikant, V
    [J]. PROCEEDINGS OF SIXTH INTERNATIONAL IEEE WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2 2021), 2021, : 38 - 44