Autonomic Resource Management for Program Orchestration in Large-scale Data Analysis

被引:0
作者
Tanaka, Masahiro [1 ]
Taurat, Kenjiro [2 ]
Torisawa, Kentaro [1 ]
机构
[1] Natl Inst Informat & Commun Technol NICT, Data Driven Intelligent Syst Res Ctr DIRECT, Universal Commun Res Inst, 3-5 Hikaridai,Seika Cho, Kyoto 6190289, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1130033, Japan
来源
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2017年
关键词
Large-scale data processing; program composition; service composition; self-tuning; resource management;
D O I
10.1109/IPDPS.2017.89
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale data analysis applications are becoming more and more prevalent in a wide variety of areas. These applications are composed of many currently available programs called analysis components. Thousands of analysis component processes are orchestrated on many compute nodes. This paper proposes a novel self-tuning framework for optimizing an application's throughput in large-scale data analysis. One challenge is developing efficient orchestration that takes into account the diversity of analysis components and the varying performances of compute nodes. In our previous work, we achieved such an orchestration to a certain degree by introducing our own middleware, which wraps each analysis component as a remote procedure call (RPC) service. The middleware also pools the processes to reduce startup overhead, which is a serious obstacle to achieving high throughput. This work tackles the remaining task of tuning the size of the analysis components' process pools to maximize the application's throughput. This is challenging because analysis components differ drastically in turnaround times and memory footprints. The size of the process pool for each type of analysis component should be set by giving consideration to these properties as well as the constraints on both the memory capacity and the processor core counts. In this work, we formulate this task as a linear programming problem and obtain the optimal pool sizes by solving it. Compared to our previous work, we significantly improved the scalability of our framework by reformulating the performance model to work on hundreds of heterogeneous nodes. We also extended the service allocation mechanism to manage the computational load on each compute node and reduce communication overhead. The experimental results show that our approach is scalable to thousands of analysis component processes running on 200 compute nodes across three clusters. Moreover, our approach significantly reduces memory footprint.
引用
收藏
页码:1088 / 1097
页数:10
相关论文
共 50 条
  • [31] A dynamic rescheduling algorithm for resource management in large scale dependable distributed systems
    Olteanu, Alexandra
    Pop, Florin
    Dobre, Ciprian
    Cristea, Valentin
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2012, 63 (09) : 1409 - 1423
  • [32] RESOURCE MANAGEMENT BASED ON GOSSIP MONITORING ALGORITHM FOR LARGE SCALE DISTRIBUTED SYSTEMS
    Pop, Florin
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2011, 12 (01): : 21 - 34
  • [33] Autonomic resource management in virtualized data centers using fuzzy logic-based approaches
    Jing Xu
    Ming Zhao
    José Fortes
    Robert Carpenter
    Mazin Yousif
    Cluster Computing, 2008, 11 : 213 - 227
  • [34] Hybrid Optimization Method for Large-Scale Multimode Resource-Constrained Project Scheduling Problem
    Sonmez, Rifat
    Gurel, Mustafa
    JOURNAL OF MANAGEMENT IN ENGINEERING, 2016, 32 (06)
  • [35] On the Large-scale Graph Data Processing for User Interface Testing in Big Data Science Projects
    Uygun, Yasin
    Oguz, Ramazan Faruk
    Olmezogullari, Erdi
    Aktas, Mehmet S.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2049 - 2056
  • [36] Cooperative Particle Swarm Optimization With a Bilevel Resource Allocation Mechanism for Large-Scale Dynamic Optimization
    Liu, Xiao-Fang
    Zhang, Jun
    Wang, Jun
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (02) : 1000 - 1011
  • [37] Analysis of the resource management of the Money Direct at School Program (PDDE) in municipalities
    Vasconcelos, Alcyone
    de Souza Franck, Luciana Nazare
    CONHECIMENTO & DIVERSIDADE, 2022, 14 (32): : 48 - 73
  • [38] Analysis and Optimization for Large-Scale LoRa Networks: Throughput Fairness and Scalability
    Lyu, Jiangbin
    Yu, Dan
    Fu, Liqun
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12): : 9574 - 9590
  • [39] Efficient and Portable Distribution Modeling for Large-Scale Scientific Data Processing with Data-Parallel Primitives
    Yang, Hao-Yi
    Lin, Zhi-Rong
    Wang, Ko-Chih
    ALGORITHMS, 2021, 14 (10)
  • [40] ServiceNet: resource-efficient architecture for topology discovery in large-scale multi-tenant clouds
    Garcia, Angel Gama
    Calero, Jose M. Alcaraz
    Mora, Higinio Mora
    Wang, Qi
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (07): : 8965 - 8982