Optimizing Machine Learning Workloads in Collaborative Environments

被引:14
|
作者
Derakhshan, Behrouz [1 ]
Mahdiraji, Alireza Rezaei [1 ]
Abedjan, Ziawasch [2 ]
Rabl, Tilmann [2 ,3 ]
Markl, Volker [1 ,2 ]
机构
[1] DFKI GmbH, Kaiserslautern, Germany
[2] TU Berlin, Berlin, Germany
[3] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
来源
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2020年
关键词
OPTIMIZATION;
D O I
10.1145/3318464.3389715
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Effective collaboration among data scientists results in high-quality and efficient machine learning (ML) workloads. In a collaborative environment, such as Kaggle or Google Colabratory, users typically re-execute or modify published scripts to recreate or improve the result. This introduces many redundant data processing and model training operations. Reusing the data generated by the redundant operations leads to the more efficient execution of future workloads. However, existing collaborative environments lack a data management component for storing and reusing the result of previously executed operations. In this paper, we present a system to optimize the execution of ML workloads in collaborative environments by reusing previously performed operations and their results. We utilize a so-called Experiment Graph (EG) to store the artifacts, i.e., raw and intermediate data or ML models, as vertices and operations of ML workloads as edges. In theory, the size of EG can become unnecessarily large, while the storage budget might be limited. At the same time, for some artifacts, the overall storage and retrieval cost might outweigh the recomputation cost. To address this issue, we propose two algorithms for materializing artifacts based on their likelihood of future reuse. Given the materialized artifacts inside EG, we devise a linear-time reuse algorithm to find the optimal execution plan for incoming ML workloads. Our reuse algorithm only incurs a negligible overhead and scales for the high number of incoming ML workloads in collaborative environments. Our experiments show that we improve the run-time by one order of magnitude for repeated execution of the workloads and 50% for the execution of modified workloads in collaborative environments.
引用
收藏
页码:1701 / 1716
页数:16
相关论文
共 50 条
  • [1] On Optimizing Machine Learning Workloads via Kernel Fusion
    Ashari, Arash
    Tatikonda, Shirish
    Boehm, Matthias
    Reinwald, Berthold
    Campbell, Keith
    Keenleyside, John
    Sadayappan, P.
    ACM SIGPLAN NOTICES, 2015, 50 (08) : 173 - 182
  • [2] Optimizing Machine Learning on Apache Spark in HPC Environments
    Li, Zhenyu
    Davis, James
    Jarvis, Stephen A.
    PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 95 - 105
  • [3] Serving Machine Learning Workloads in Resource Constrained Environments: a Serverless Deployment Example
    Christidis, Angelos
    Davies, Roy
    Moschoyiannis, Sotiris
    2019 IEEE 12TH CONFERENCE ON SERVICE-ORIENTED COMPUTING AND APPLICATIONS (SOCA 2019), 2019, : 55 - 63
  • [4] Optimizing Cloud Workloads: Autoscaling with Reinforcement Learning
    Mishra, Pratik
    Hans, Sandeep
    Saha, Diptikalyan
    Moogi, Pratibha
    2024 IEEE 17TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD 2024, 2024, : 217 - 222
  • [5] Accelerating Containerized Machine Learning Workloads
    Tariq, Ali
    Cao, Lianjie
    Ahmed, Faraz
    Rozner, Eric
    Sharma, Puneet
    PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024, 2024,
  • [6] Optimizing Deep Learning Workloads on ARM GPU with TVM
    Zheng, Lianmin
    Chen, Tianqi
    1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,
  • [7] Federated Learning: Advancements, Applications, and Future Directions for Collaborative Machine Learning in Distributed Environments
    Katyayani, M.
    Keshamoni, Kumar
    Murthy, A. Sree Rama Chandra
    Rani, K. Usha
    Reddy, Sreenivasulu L.
    Alapati, Yaswanth Kumar
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (05) : 165 - 171
  • [8] Architectural Requirements for Deep Learning Workloads in HPC Environments
    Ibrahim, Khaled Z.
    Tan Nguyen
    Hai Ah Nam
    Bhimji, Wahid
    Farrell, Steven
    Oliker, Leonid
    Rowan, Michael
    Wright, Nicholas J.
    Williams, Samuel
    PROCEEDINGS OF PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2021), 2021, : 7 - 17
  • [9] Collaborative machine learning
    Hofmann, T
    Basilico, J
    FROM INTEGRATED PUBLICATION AND INFORMATION SYSTEMS TO VIRTUAL INFORMATION AND KNOWLEDGE ENVIRONMENTS: ESSAYS DEDICATED TO ERICH J NEUHOLD ON THE OCCASION OF HIS 65TH BIRTHDAY, 2005, 3379 : 173 - 182
  • [10] Virtual Collaborative Learning Environments
    Konstantinidis, Andreas
    BULLETIN OF THE TECHNICAL COMMITTEE ON LEARNING TECHNOLOGY, 2011, 13 (03): : 35 - 36