Characterizing Distributed Machine Learning Workloads on Apache Spark (Experimentation and Deployment Paper)

被引:1
|
作者
Djebrouni, Yasmine [1 ]
Rocha, Isabelly [2 ]
Bouchenak, Sara [3 ]
Chen, Lydia [2 ,4 ]
Felber, Pascal [2 ]
Marangozova, Vania [1 ]
Schiavoni, Valerio [2 ]
机构
[1] Univ Grenoble Alps, Grenoble, France
[2] Univ Neuchatel, Neuchatel, Switzerland
[3] INSA Lyon, Lyon, France
[4] Delft Univ Technol, Delft, Netherlands
来源
PROCEEDINGS OF THE 24TH ACM/IFIP INTERNATIONAL MIDDLEWARE CONFERENCE, MIDDLEWARE 2023 | 2023年
关键词
Distributed Machine Learning; Distributed Deep Learning; Trace Collection; Workload Characterization; Multi-level Configuration; PERFORMANCE;
D O I
10.1145/3590140.3629112
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Distributed machine learning (DML) environments are widely used in many application domains to build decision-making systems. However, the complexity of these environments is overwhelming for novice users. On the one hand, data scientists are more familiar with hyper-parameter tuning and typically lack an understanding of the trade-offs and challenges of parameterizing DML platforms to achieve good performance. On the other hand, system administrators focus on tuning distributed platforms, unaware of the possible implications of the platform on the quality of the learning models. To shed light on such parameter configuration interplay, we run multiple DML workloads on the widely used Apache Spark distributed platform, leveraging 13 popular learning methods and 6 real-world datasets on two distinct clusters. We collect and perform an in-depth analysis of workload execution traces to compare the efficiency of different configuration strategies. We consider tuning only hyper-parameters, tuning only platform parameters, and jointly tuning both hyper-parameters and platform parameters. We publicly release our collected traces and derive key takeaways on DML workloads. Counter-intuitively, platform parameters have a higher impact on the model quality than hyper-parameters. More generally, we show that multi-level parameter configuration can provide better results in terms of model quality and execution time while also optimizing resource costs.
引用
收藏
页码:151 / 164
页数:14
相关论文
共 8 条
  • [1] Model averaging in distributed machine learning: a case study with Apache Spark
    Guo, Yunyan
    Zhang, Zhipeng
    Jiang, Jiawei
    Wu, Wentao
    Zhang, Ce
    Cui, Bin
    Li, Jianzhong
    VLDB JOURNAL, 2021, 30 (04): : 693 - 712
  • [2] Predicting Diabetes using Distributed Machine Learning based on Apache Spark
    Ahmed, Hager
    Younis, Eman M. G.
    Ali, Abdelmgeid A.
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 44 - 49
  • [3] Model averaging in distributed machine learning: a case study with Apache Spark
    Yunyan Guo
    Zhipeng Zhang
    Jiawei Jiang
    Wentao Wu
    Ce Zhang
    Bin Cui
    Jianzhong Li
    The VLDB Journal, 2021, 30 : 693 - 712
  • [4] Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads
    Jangda, Abhinav
    Huang, Jun
    Liu, Guodong
    Sabet, Amir Hossein Nodehi
    Maleki, Saeed
    Miao, Youshan
    Musuvathi, Madanlal
    Mytkowicz, Todd
    Saarikivi, Olli
    ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 402 - 416
  • [5] dSyncPS: Delayed Synchronization for Dynamic Deployment of Distributed Machine Learning
    Guo, Yibo
    Wang, An
    PROCEEDINGS OF THE 2022 2ND EUROPEAN WORKSHOP ON MACHINE LEARNING AND SYSTEMS (EUROMLSYS '22), 2022, : 79 - 86
  • [6] Distributed Deep Learning for Big Remote Sensing Data Processing on Apache Spark: Geological Remote Sensing Interpretation as a Case Study
    Long, Ao
    Han, Wei
    Huang, Xiaohui
    Li, Jiabao
    Wang, Yuewei
    Chen, Jia
    WEB AND BIG DATA, PT I, APWEB-WAIM 2023, 2024, 14331 : 96 - 110
  • [7] Screening hardware and volume factors in distributed machine learning algorithms on spark A design of experiments (DoE) based approach
    Rodrigues, Jairson B.
    Vasconcelos, Germano C.
    Maciel, Paulo R. M.
    COMPUTING, 2021, 103 (10) : 2203 - 2225
  • [8] Comparative Analysis on the Deployment of Machine Learning Algorithms in the Distributed Brillouin Optical Time Domain Analysis (BOTDA) Fiber Sensor
    Nordin, Nur Dalilla
    Zan, Mohd Saiful Dzulkefly
    Abdullah, Fairuz
    PHOTONICS, 2020, 7 (04)