Towards Performance and Scalability Analysis of Distributed Memory Programs on Large-Scale Clusters

被引:1
|
作者
Medya, Sourav [1 ,2 ]
Cherkasova, Ludmila [2 ]
Magalhaes, Guilherme [3 ]
Ozonat, Kivanc [2 ]
Padmanabha, Chaitra [3 ]
Sarma, Jiban [3 ]
Sheikh, Imran [3 ]
机构
[1] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
[3] Hewlett Packard Enterprise, Palo Alto, CA USA
关键词
D O I
10.1145/2851553.2858669
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many HPC and modern Big Data processing applications belong to a class of so-called scale-out applications, where the application dataset is partitioned and processed by a cluster of machines. Understanding and assessing the scalability of the designed application is one of the primary goals during the application implementation. Typically, in the design and implementation phase, the programmer is bound to a limited size cluster for debugging and performing profiling experiments. The challenge is to assess the scalability of the designed program for its execution on a larger cluster. While in an increased size cluster, each node needs to process a smaller fraction of the original dataset, the communication volume and communication time might be significantly increased, which could become detrimental and provide diminishing performance benefits. The distributed memory applications exhibit complex behavior: they tend to interleave computations and communications, use bursty transfers, and utilize global synchronization primitives. Therefore, one of the main challenges is the analysis of bandwidth demands due to increased communication volume as a function of a cluster size. In this paper(1), we introduce a novel approach to assess the scalability and performance of a distributed memory program for execution on a large-scale cluster. Our solution involves 1) a limited set of traditional experiments performed in a medium size cluster and 2) an additional set of similar experiments performed with an "interconnect bandwidth throttling" tool, which enables the assessment of the communication demands with respect to available bandwidth. This approach enables a prediction of a cluster size, where a communication cost becomes a dominant component, at which point the performance benefits of the increased cluster lead to a diminishing return. We demonstrate the proposed approach using a popular Graph500 benchmark.
引用
收藏
页码:113 / 116
页数:4
相关论文
共 50 条
  • [21] A System for Large-Scale Analysis of Distributed Cameras
    Kaseb, Ahmed S.
    Berry, Everett
    Koh, Youngsol
    Mohan, Anup
    Chen, Wenyi
    Li, He
    Lu, Yung-Hsiang
    Delp, Edward J.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 340 - 344
  • [22] Parallelizing RRT on Large-Scale Distributed-Memory Architectures
    Devaurs, Didier
    Simeon, Thierry
    Cortes, Juan
    IEEE TRANSACTIONS ON ROBOTICS, 2013, 29 (02) : 571 - 579
  • [23] Large-Scale Merging of Histograms using Distributed In Memory Computing
    Blomer, Jakob
    Ganis, Gerardo
    21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [24] Large-scale parallel reservoir simulation on distributed memory systems
    Cao, JW
    Pan, F
    Sun, JC
    Liu, W
    DCABES 2001 PROCEEDINGS, 2001, : 98 - 103
  • [25] Towards to dynamic optimal control for large-scale distributed systems
    Li S.
    Control Theory and Technology, 2017, 15 (2) : 158 - 160
  • [26] Towards a Distributed Large-Scale Dynamic Graph Data Store
    Iwabuchi, Keita
    Sallinen, Scott
    Pearce, Roger
    Van Essen, Brian
    Gokhale, Maya
    Matsuoka, Satoshi
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 892 - 901
  • [27] Large-scale contact/impact simulation and sensitivity analysis on distributed-memory computers
    Watson, BC
    Noor, AK
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 1997, 141 (3-4) : 373 - 388
  • [28] Scalability and performance of two large Linux clusters
    Brightwell, R
    Plimpton, S
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (11) : 1546 - 1569
  • [29] Performance tuning of large-scale distributed WWW caches
    Srbljic, S
    Milanovic, A
    Hadjina, N
    MELECON 2000: INFORMATION TECHNOLOGY AND ELECTROTECHNOLOGY FOR THE MEDITERRANEAN COUNTRIES, VOLS 1-3, PROCEEDINGS, 2000, : 93 - 96
  • [30] Scalability and Performance Evaluation of DDM-Based Aggregation/Dissaggregation Protocols for Large-Scale Distributed Interactive Simulations Systems
    Azzedine Boukerche
    Caron Dzermajko
    The Journal of Supercomputing, 2006, 35 : 259 - 276