SciLance: Mitigate Load Imbalance for Parallel Scientific Applications in Cloud Environments

被引:0
|
作者
Wang, Xinying [1 ]
Wan, Lipeng [2 ]
Klasky, Scott [3 ]
Zhao, Dongfang [4 ]
Yan, Feng [5 ]
机构
[1] Univ Nevada, Reno, NV 89557 USA
[2] Georgia State Univ, Atlanta, GA USA
[3] Oak Ridge Natl Lab, Oak Ridge, TN USA
[4] Univ Washington, Tacoma, WA USA
[5] Univ Houston, Houston, TX USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER | 2023年
基金
美国国家科学基金会;
关键词
load balancing; resource management; parallel computing;
D O I
10.1109/CLUSTER52292.2023.00012
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Elastic cloud computing provides new opportunities for accelerating the process of scientific discovery. However, unlike high-performance computing (HPC) systems that are built and optimized for workloads with intensive inter-node communication demands, the low-latency and high bandwidth communication capability is only enabled on a few very expensive high-end instance types in the cloud, which leads to poor cost-effectiveness. In addition, re-balancing the workload through extra data movement among compute nodes is a common way to mitigate the load imbalance issue in many scientific simulations, which further amplifies the communication pressure and makes it challenging to efficiently use cloud resources. To this end, we propose SciLance, which addresses the workload imbalance challenge by utilizing the heterogeneous and elastic resources offered by cloud platforms. Particularly, instead of moving data excessively among compute instances to balance the workload, SciLance dynamically adjusts the computer instances used for running parallel tasks based on the runtime imbalance identified through profiling. We prototype SciLance and perform extensive evaluation using adaptive mesh refinement (AMR) based scientific applications. The evaluation results demonstrate that SciLance can achieve up to 36.63% better performance with 16.91% lower cost for AMR-based simulation codes.
引用
收藏
页码:49 / 59
页数:11
相关论文
共 50 条
  • [1] Scalability of parallel scientific applications on the cloud
    Srirama, Satish Narayana
    Batrashev, Oleg
    Jakovits, Pelle
    Vainikko, Eero
    SCIENTIFIC PROGRAMMING, 2011, 19 (2-3) : 91 - 105
  • [2] On the Benefits of Anticipating Load Imbalance for Performance Optimization of Parallel Applications
    Boulmier, Anthony
    Raynaud, Franck
    Abdennadher, Nabil
    Chopard, Bastien
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 451 - 459
  • [3] Load imbalance in parallel programs
    Calzarossa, M
    Massari, L
    Tessera, D
    PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2003, 2763 : 197 - 206
  • [4] Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications
    Boehme, David
    Wolf, Felix
    Geimer, Markus
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2538 - 2541
  • [5] An efficient data transfer service for scientific applications in cloud environments
    Hu Y.
    Liu C.
    International Journal of Networking and Virtual Organisations, 2019, 21 (03): : 289 - 306
  • [6] Parallel Web Server Load Balancing Technology of Cloud Computing Environments
    Yang, Kang
    Song, Xiao
    Li, Xiang
    2014 IEEE CHINESE GUIDANCE, NAVIGATION AND CONTROL CONFERENCE (CGNCC), 2014, : 968 - 971
  • [7] Automated Deployment and Parallel Execution of Legacy Applications in Cloud Environments
    Goettsche, Michael
    Glaser, Fabian
    Herbold, Steffen
    Grabowski, Jens
    2015 IEEE 8TH INTERNATIONAL CONFERENCE ON SERVICE-ORIENTED COMPUTING AND APPLICATIONS (SOCA), 2015, : 188 - 192
  • [8] Application load imbalance on parallel processors
    Govindan, V
    Franklin, MA
    10TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM - PROCEEDINGS OF IPPS '96, 1996, : 836 - 842
  • [9] User-defined Tools for Characterizing Task-Parallel Applications and Predicting Load Imbalance
    Minh Thanh Chung
    Kranzlmueller, Dieter
    2021 15TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND APPLICATIONS (ACOMP 2021), 2021, : 98 - 105
  • [10] Efficient scientific workflow scheduling for deadline-constrained parallel tasks in cloud computing environments
    Zhang, Longxin
    Zhou, Liqian
    Salah, Ahmad
    INFORMATION SCIENCES, 2020, 531 (531) : 31 - 46