Autonomic Resource Management for Program Orchestration in Large-scale Data Analysis

被引:0
作者
Tanaka, Masahiro [1 ]
Taurat, Kenjiro [2 ]
Torisawa, Kentaro [1 ]
机构
[1] Natl Inst Informat & Commun Technol NICT, Data Driven Intelligent Syst Res Ctr DIRECT, Universal Commun Res Inst, 3-5 Hikaridai,Seika Cho, Kyoto 6190289, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1130033, Japan
来源
2017 31ST IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS) | 2017年
关键词
Large-scale data processing; program composition; service composition; self-tuning; resource management;
D O I
10.1109/IPDPS.2017.89
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Large-scale data analysis applications are becoming more and more prevalent in a wide variety of areas. These applications are composed of many currently available programs called analysis components. Thousands of analysis component processes are orchestrated on many compute nodes. This paper proposes a novel self-tuning framework for optimizing an application's throughput in large-scale data analysis. One challenge is developing efficient orchestration that takes into account the diversity of analysis components and the varying performances of compute nodes. In our previous work, we achieved such an orchestration to a certain degree by introducing our own middleware, which wraps each analysis component as a remote procedure call (RPC) service. The middleware also pools the processes to reduce startup overhead, which is a serious obstacle to achieving high throughput. This work tackles the remaining task of tuning the size of the analysis components' process pools to maximize the application's throughput. This is challenging because analysis components differ drastically in turnaround times and memory footprints. The size of the process pool for each type of analysis component should be set by giving consideration to these properties as well as the constraints on both the memory capacity and the processor core counts. In this work, we formulate this task as a linear programming problem and obtain the optimal pool sizes by solving it. Compared to our previous work, we significantly improved the scalability of our framework by reformulating the performance model to work on hundreds of heterogeneous nodes. We also extended the service allocation mechanism to manage the computational load on each compute node and reduce communication overhead. The experimental results show that our approach is scalable to thousands of analysis component processes running on 200 compute nodes across three clusters. Moreover, our approach significantly reduces memory footprint.
引用
收藏
页码:1088 / 1097
页数:10
相关论文
共 50 条
  • [41] Performance Analysis and Optimal Resource Allocation for Large Scale Joint Sensing and Communication
    Xu, Jiajie
    Kishk, Mustafa A.
    Coon, Justin P.
    Alouini, Mohamed-Slim
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (10) : 14350 - 14364
  • [42] Review of classical dimensionality reduction and sample selection methods for large-scale data processing
    Xu, Xinzheng
    Liang, Tianming
    Zhu, Jiong
    Zheng, Dong
    Sun, Tongfeng
    NEUROCOMPUTING, 2019, 328 : 5 - 15
  • [43] Large-scale data processing software and performance instabilities within HEP grid environments
    Datskova, Olga
    Shi, Wedong
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2019, 10 (04) : 402 - 414
  • [44] Data Processing Pipeline of Short-Term Depression Detection with Large-Scale Dataset
    Lee, Yonggeon
    Noh, Youngtae
    Lee, Uichin
    2023 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING, BIGCOMP, 2023, : 391 - 392
  • [45] A Distributed Framework for Spatio-temporal Analysis on Large-scale Camera Networks
    Hong, Kirak
    Voelz, Marco
    Govindaraju, Venu
    Jayaraman, Bharat
    Ramachandran, Umakishore
    2013 33RD IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2013), 2013, : 309 - 314
  • [46] SORMSYS: Towards a Resource Management Platform for Self-Organizing Large Scale Distributed Systems
    Pop, Florin
    12TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2010), 2011, : 534 - 541
  • [47] Distributed Resource Scheduling for Large-Scale MEC Systems: A Multiagent Ensemble Deep Reinforcement Learning With Imitation Acceleration
    Jiang, Feibo
    Dong, Li
    Wang, Kezhi
    Yang, Kun
    Pan, Cunhua
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (09) : 6597 - 6610
  • [48] Computation Resource Optimization for Large-Scale Intelligent Urban Rail Transit: A Mean-Field Game Approach
    Li, Yang
    Zhu, Li
    Wu, Jinsong
    Wang, Hongwei
    Yu, F. Richard
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) : 9868 - 9879
  • [49] Lottery Incentives and Resource Management: Evidence from the Agricultural Data Reporting Incentive Program (AgDRIP)
    Ben S. Meiselman
    Collin Weigel
    Paul J. Ferraro
    Mark Masters
    Kent D. Messer
    Olesya M. Savchenko
    Jordan F. Suter
    Environmental and Resource Economics, 2022, 82 : 847 - 867
  • [50] Lottery Incentives and Resource Management: Evidence from the Agricultural Data Reporting Incentive Program (AgDRIP)
    Meiselrman, Ben S.
    Weigel, Collin
    Ferraro, Paul J.
    Masters, Mark
    Messer, Kent D.
    Savchenko, Olesya M.
    Suter, Jordan F.
    ENVIRONMENTAL & RESOURCE ECONOMICS, 2022, 82 (04) : 847 - 867