Towards Performance and Scalability Analysis of Distributed Memory Programs on Large-Scale Clusters

被引:1
|
作者
Medya, Sourav [1 ,2 ]
Cherkasova, Ludmila [2 ]
Magalhaes, Guilherme [3 ]
Ozonat, Kivanc [2 ]
Padmanabha, Chaitra [3 ]
Sarma, Jiban [3 ]
Sheikh, Imran [3 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
[3] Hewlett Packard Enterprise, Palo Alto, CA USA
关键词
D O I
10.1145/2851553.2858669
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many HPC and modern Big Data processing applications belong to a class of so-called scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Understanding and assessing the scalability of the designed application is one of the primary goals during the application implementation. Typically, in the design and implementation phase, the programmer is bound to a limited size cluster for debugging and performing profiling experiments. The challenge is to assess the scalability of the designed program for its execution on a larger cluster. While in an increased size cluster, each node needs to process a smaller fraction of the original dataset, the communication volume and communication time might be significantly increased, which could become detrimental and provide diminishing performance benefits. The distributed memory applications exhibit complex behavior: they tend to interleave computations and communications, use bursty transfers, and utilize global synchronization primitives. Therefore, one of the main challenges is the analysis of bandwidth demands due to increased communication volume as a function of a cluster size. In this paper(1), we introduce a novel approach to assess the scalability and performance of a distributed memory program for execution on a large-scale cluster. Our solution involves 1) a limited set of traditional experiments performed in a medium size cluster and 2) an additional set of similar experiments performed with an "interconnect bandwidth throttling" tool, which enables the assessment of the communication demands with respect to available bandwidth. This approach enables a prediction of a cluster size, where a communication cost becomes a dominant component, at which point the performance benefits of the increased cluster lead to a diminishing return. We demonstrate the proposed approach using a popular Graph500 benchmark.
引用
收藏
页码:113 / 116
页数:4
相关论文
共 50 条
  • [31] Distributed Channel Assignment in Large-Scale Wireless Mesh Networks: A Performance Analysis
    Juraschek, Felix
    Seif, Simon
    Guenes, Mesut
    2013 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2013, : 1661 - +
  • [32] Scalability and performance evaluation of DDM-based aggregation/dissaggregation protocols for large-scale distributed interactive simulations systems
    Boukerche, A
    Dzermajko, C
    JOURNAL OF SUPERCOMPUTING, 2006, 35 (03): : 259 - 276
  • [33] Scalability Analysis of Large-Scale LoRaWAN Networks in ns-3
    Van den Abeele, Floris
    Haxhibeqiri, Jetmir
    Moerman, Ingrid
    Hoebeke, Jeroen
    IEEE INTERNET OF THINGS JOURNAL, 2017, 4 (06): : 2186 - 2198
  • [34] Scalability of Large-Scale Photonic Integrated Circuits
    Su, Yikai
    He, Yu
    Guo, Xuhan
    Xie, Weiqiang
    Ji, Xingchen
    Wang, Hongwei
    Cai, Xinlun
    Tong, Limin
    Yu, Siyuan
    ACS PHOTONICS, 2023, 10 (07) : 2020 - 2030
  • [35] Scalability and accuracy in a large-scale network emulator
    Vahdat, A
    Yocum, K
    Walsh, K
    Mahadevan, P
    Kostic, D
    Chase, J
    Becker, D
    USENIX ASSOCIATION PROCEEDINGS OF THE FIFTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2002, : 271 - 284
  • [36] DiBA: Distributed Power Budget Allocation for Large-Scale Computing Clusters
    Badiei, Masoud
    Zhan, Xin
    Azimi, Reza
    Reda, Sherief
    Li, Na
    2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 70 - 79
  • [37] Large-scale Distributed Verification Using CADP: Beyond Clusters to Grids
    Garavel, Hubert
    Mateescu, Radu
    Serwe, Wendelin
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2013, 296 : 145 - 161
  • [38] Analysis and Optimization for Large-Scale LoRa Networks: Throughput Fairness and Scalability
    Lyu, Jiangbin
    Yu, Dan
    Fu, Liqun
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12): : 9574 - 9590
  • [39] On Performance and Scalability of Cost-Effective SNMP Managers for Large-Scale Polling
    Roquero, Paula
    Aracil, Javier
    IEEE ACCESS, 2021, 9 : 7374 - 7383
  • [40] Performance Evaluation of LoRaWAN Communication Scalability in Large-Scale Wireless Sensor Networks
    Lavric, Alexandru
    Popa, Valentin
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2018,