Autonomic Resource Management for Program Orchestration in Large-scale Data Analysis

被引:0
|
作者
Tanaka, Masahiro [1 ]
Taurat, Kenjiro [2 ]
Torisawa, Kentaro [1 ]
机构
[1] Natl Inst Informat & Commun Technol NICT, Data Driven Intelligent Syst Res Ctr DIRECT, Universal Commun Res Inst, 3-5 Hikaridai,Seika Cho, Kyoto 6190289, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1130033, Japan
来源
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2017年
关键词
Large-scale data processing; program composition; service composition; self-tuning; resource management;
D O I
10.1109/IPDPS.2017.89
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale data analysis applications are becoming more and more prevalent in a wide variety of areas. These applications are composed of many currently available programs called analysis components. Thousands of analysis component processes are orchestrated on many compute nodes. This paper proposes a novel self-tuning framework for optimizing an application's throughput in large-scale data analysis. One challenge is developing efficient orchestration that takes into account the diversity of analysis components and the varying performances of compute nodes. In our previous work, we achieved such an orchestration to a certain degree by introducing our own middleware, which wraps each analysis component as a remote procedure call (RPC) service. The middleware also pools the processes to reduce startup overhead, which is a serious obstacle to achieving high throughput. This work tackles the remaining task of tuning the size of the analysis components' process pools to maximize the application's throughput. This is challenging because analysis components differ drastically in turnaround times and memory footprints. The size of the process pool for each type of analysis component should be set by giving consideration to these properties as well as the constraints on both the memory capacity and the processor core counts. In this work, we formulate this task as a linear programming problem and obtain the optimal pool sizes by solving it. Compared to our previous work, we significantly improved the scalability of our framework by reformulating the performance model to work on hundreds of heterogeneous nodes. We also extended the service allocation mechanism to manage the computational load on each compute node and reduce communication overhead. The experimental results show that our approach is scalable to thousands of analysis component processes running on 200 compute nodes across three clusters. Moreover, our approach significantly reduces memory footprint.
引用
收藏
页码:1088 / 1097
页数:10
相关论文
共 50 条
  • [1] Low Latency and Resource-aware Program Composition for Large-scale Data Analysis
    Tanaka, Masahiro
    Taura, Kenjiro
    Torisawa, Kentaro
    2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 325 - 330
  • [2] Multi-Domain TSN Orchestration and Management for Large-Scale Industrial Networks
    Bhattacharjee, Sushmit
    Alexandris, Konstantinos
    Bauschert, Thomas
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (06): : 6480 - 6492
  • [3] Large-Scale Experiment for Topology-Aware Resource Management
    Georgiou, Yiannis
    Mercier, Guillaume
    Villiermet, Adele
    EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS, 2018, 10659 : 179 - 186
  • [4] GODEL: Unified Large-Scale Resource Management and Scheduling at ByteDance
    Xiang, Wu
    Li, Yakun
    Ren, Yuquan
    Jiang, Fan
    Xin, Chaohui
    Gupta, Varun
    Xiang, Chao
    Song, Xinyi
    Liu, Meng
    Li, Bing
    Shao, Kaiyang
    Xu, Chen
    Shao, Wei
    Fu, Yuqi
    Wang, Wilson
    Xu, Cong
    Xu, Wei
    Lin, Caixue
    Shi, Rui
    Liang, Yuming
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON CLOUD COMPUTING, SOCC 2023, 2023, : 308 - 323
  • [5] Resource Bundles: Using Aggregation for Statistical Large-Scale Resource Discovery and Management
    Cardosa, Michael
    Chandra, Abhishek
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2010, 21 (08) : 1089 - 1102
  • [6] Value of service based resource management for large-scale computing systems
    Tunc, Cihan
    Machovec, Dylan
    Kumbhare, Nirmal
    Akoglu, Ali
    Hariri, Salim
    Khemka, Bhavesh
    Siegel, Howard Jay
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 2013 - 2030
  • [7] Resource and Network Management Framework for a Large-Scale Satellite Communications System
    Abe, Yuma
    Ogura, Masaki
    Tsuji, Hiroyuki
    Miura, Amane
    Adachi, Shuichi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (02) : 492 - 501
  • [8] Self-Adaptive Resource Management for Large-Scale Shared Clusters
    Yan Li
    Feng-Hong Chen
    Xi Sun
    Ming-Hui Zhou
    Wen-Pin Jiao
    Dong-Gang Cao
    Hong Mei
    Journal of Computer Science and Technology, 2010, 25 : 945 - 957
  • [9] Self-Adaptive Resource Management for Large-Scale Shared Clusters
    李研
    陈峰宏
    孙熙
    周明辉
    焦文品
    曹东刚
    梅宏
    JournalofComputerScience&Technology, 2010, 25 (05) : 945 - 957
  • [10] Self-Adaptive Resource Management for Large-Scale Shared Cluster
    Li, Yan
    Chen, Feng-Hong
    Sun, Xi
    Zhou, Ming-Hui
    Jiao, Wen-Pin
    Cao, Dong-Gang
    Mei, Hong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2010, 25 (05) : 945 - 957