Optimizing Machine Learning Workloads in Collaborative Environments

被引：14

作者：

Derakhshan, Behrouz ^{[1
]}

Mahdiraji, Alireza Rezaei ^{[1
]}

Abedjan, Ziawasch ^{[2
]}

Rabl, Tilmann ^{[2
,3
]}

Markl, Volker ^{[1
,2
]}

机构：

[1] DFKI GmbH, Kaiserslautern, Germany

[2] TU Berlin, Berlin, Germany

[3] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany

来源：

SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2020年

关键词：

OPTIMIZATION;

D O I：

10.1145/3318464.3389715

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Effective collaboration among data scientists results in high-quality and efficient machine learning (ML) workloads. In a collaborative environment, such as Kaggle or Google Colabratory, users typically re-execute or modify published scripts to recreate or improve the result. This introduces many redundant data processing and model training operations. Reusing the data generated by the redundant operations leads to the more efficient execution of future workloads. However, existing collaborative environments lack a data management component for storing and reusing the result of previously executed operations. In this paper, we present a system to optimize the execution of ML workloads in collaborative environments by reusing previously performed operations and their results. We utilize a so-called Experiment Graph (EG) to store the artifacts, i.e., raw and intermediate data or ML models, as vertices and operations of ML workloads as edges. In theory, the size of EG can become unnecessarily large, while the storage budget might be limited. At the same time, for some artifacts, the overall storage and retrieval cost might outweigh the recomputation cost. To address this issue, we propose two algorithms for materializing artifacts based on their likelihood of future reuse. Given the materialized artifacts inside EG, we devise a linear-time reuse algorithm to find the optimal execution plan for incoming ML workloads. Our reuse algorithm only incurs a negligible overhead and scales for the high number of incoming ML workloads in collaborative environments. Our experiments show that we improve the run-time by one order of magnitude for repeated execution of the workloads and 50% for the execution of modified workloads in collaborative environments.

引用

页码：1701 / 1716

页数：16

共 50 条

[1] On Optimizing Machine Learning Workloads via Kernel Fusion
Ashari, Arash
Tatikonda, Shirish
Boehm, Matthias
Reinwald, Berthold
Campbell, Keith
Keenleyside, John
Sadayappan, P.
ACM SIGPLAN NOTICES, 2015, 50 (08) : 173 - 182
[2] Optimizing Machine Learning on Apache Spark in HPC Environments
Li, Zhenyu
Davis, James
Jarvis, Stephen A.
PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 95 - 105
[3] Serving Machine Learning Workloads in Resource Constrained Environments: a Serverless Deployment Example
Christidis, Angelos
Davies, Roy
Moschoyiannis, Sotiris
2019 IEEE 12TH CONFERENCE ON SERVICE-ORIENTED COMPUTING AND APPLICATIONS (SOCA 2019), 2019, : 55 - 63
[4] Optimizing Cloud Workloads: Autoscaling with Reinforcement Learning
Mishra, Pratik
Hans, Sandeep
Saha, Diptikalyan
Moogi, Pratibha
2024 IEEE 17TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD 2024, 2024, : 217 - 222
[5] Accelerating Containerized Machine Learning Workloads
Tariq, Ali
Cao, Lianjie
Ahmed, Faraz
Rozner, Eric
Sharma, Puneet
PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024, 2024,
[6] Optimizing Deep Learning Workloads on ARM GPU with TVM
Zheng, Lianmin
Chen, Tianqi
1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,
[7] Federated Learning: Advancements, Applications, and Future Directions for Collaborative Machine Learning in Distributed Environments
Katyayani, M.
Keshamoni, Kumar
Murthy, A. Sree Rama Chandra
Rani, K. Usha
Reddy, Sreenivasulu L.
Alapati, Yaswanth Kumar
JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (05) : 165 - 171
[8] Architectural Requirements for Deep Learning Workloads in HPC Environments
Ibrahim, Khaled Z.
Tan Nguyen
Hai Ah Nam
Bhimji, Wahid
Farrell, Steven
Oliker, Leonid
Rowan, Michael
Wright, Nicholas J.
Williams, Samuel
PROCEEDINGS OF PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2021), 2021, : 7 - 17
[9] Collaborative machine learning
Hofmann, T
Basilico, J
FROM INTEGRATED PUBLICATION AND INFORMATION SYSTEMS TO VIRTUAL INFORMATION AND KNOWLEDGE ENVIRONMENTS: ESSAYS DEDICATED TO ERICH J NEUHOLD ON THE OCCASION OF HIS 65TH BIRTHDAY, 2005, 3379 : 173 - 182
[10] Virtual Collaborative Learning Environments
Konstantinidis, Andreas
BULLETIN OF THE TECHNICAL COMMITTEE ON LEARNING TECHNOLOGY, 2011, 13 (03): : 35 - 36

← 1 2 3 4 5 →