Ultron-AutoML: an open-source, distributed, scalable framework for efficient hyper-parameter optimization

被引：1

作者：

Narayan, Swarnim ^{[1
]}

Krishna, Chepuri Shri ^{[1
]}

Mishra, Varun ^{[1
]}

Rai, Abhinav ^{[1
]}

Rai, Himanshu ^{[1
]}

Bharti, Chandrakant ^{[1
]}

Sodhi, Gursirat Singh ^{[1
]}

Gupta, Ashish ^{[1
]}

Singh, Nitinbalaji ^{[1
]}

机构：

[1] Walmart Global Tech India, Catalog Data Sci, Bengaluru, India

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年

关键词：

Hyperparameter optimization; Machine Learning; Deep Unsupervised/Semi-supervised/Representation/Self-supervised Learning; SEARCH;

D O I：

10.1109/BigData50022.2020.9378071

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present Ultron-AutoML, an open-source, distributed framework for efficient hyper-parameter optimization (HPO) of ML models. Considering that hyper-parameter optimization is compute intensive and time-consuming, the framework has been designed for reliability - the ability to successfully complete an HPO Job in a multi-tenant, failure prone environment, as well as efficiency - completing the job with minimum compute cost and wall-clock time. From a user's perspective, the framework emphasizes ease of use and customizability. The user can declaratively specify and execute an HPO Job, while ancillary tasks - containerizing and running the user's scripts, model checkpointing, monitoring progress, parallelization - are handled by the framework. At the same time, the user has complete flexibility in composing the code-base for specifying the ML model training algorithm as well as, optionally, any custom HPO algorithm. The framework supports the creation of data-pipelines to stream batches of shuffled and augmented data from a distributed file system. This comes in handy for training Deep Learning models based on self-supervised, semi-supervised or representation learning algorithms over large training datasets. We demonstrate the framework's reliability and efficiency by running a BERT pre-training job over a large training corpus using pre-emptible GPU compute targets. Despite the inherent unreliability of the underlying compute nodes, the framework is able to complete such long running jobs at 30% of the cost with a marginal increase in wall-clock time. The framework also comes with a service to monitor jobs and ensures reproducibility of any result.

引用

页码：1584 / 1593

页数：10

共 50 条

[1] Efficient Hyper-parameter Optimization with Cubic Regularization
Shen, Zhenqian
Yang, Hansi
Li, Yong
Kwok, James
Yao, Quanming
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] GenPipes: an open-source framework for distributed and scalable genomic analyses
Bourgey, Mathieu
Dali, Rola
Eveleigh, Robert
Chen, Kuang Chung
Letourneau, Louis
Fillon, Joel
Michaud, Marc
Caron, Maxime
Sandoval, Johanna
Lefebvre, Francois
Leveque, Gary
Mercier, Eloi
Bujold, David
Marquis, Pascale
Van, Patrick Tran
Morais, David Anderson de Lima
Tremblay, Julien
Shao, Xiaojian
Henrion, Edouard
Gonzalez, Emmanuel
Quirion, Pierre-Olivier
Caron, Bryan
Bourque, Guillaume
GIGASCIENCE, 2019, 8 (06):
[3] An efficient hyper-parameter optimization method for supervised learning
Shi, Ying
Qi, Hui
Qi, Xiaobo
Mu, Xiaofang
APPLIED SOFT COMPUTING, 2022, 126
[4] A GPU Scheduling Framework to Accelerate Hyper-Parameter Optimization in Deep Learning Clusters
Son, Jaewon
Yoo, Yonghyuk
Kim, Khu-rai
Kim, Youngjae
Lee, Kwonyong
Park, Sungyong
ELECTRONICS, 2021, 10 (03) : 1 - 15
[5] Efficient Federated Learning with Adaptive Client-Side Hyper-Parameter Optimization
Kundroo, Majid
Kim, Taehong
2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 973 - 974
[6] Rapid Scalable Distributed Power Flow with Open-Source Implementation
Dai, Xinliang
Cai, Yichen
Jiang, Yuning
Hagenmeyer, Veit
IFAC PAPERSONLINE, 2022, 55 (13): : 145 - 150
[7] Derivative-Free Optimization with Adaptive Experience for Efficient Hyper-Parameter Tuning
Hu, Yi-Qi
Liu, Zelin
Yang, Hua
Yu, Yang
Liu, Yunfeng
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1207 - 1214
[8] Distributed digital twins on the open-source OpenTwins framework
Infante, Sergio
Robles, Julia
Martin, Cristian
Rubio, Bartolome
Diaz, Manuel
ADVANCED ENGINEERING INFORMATICS, 2025, 64
[9] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
Nimrita Koul
Sunilkumar S. Manvi
Medical & Biological Engineering & Computing, 2021, 59 : 2353 - 2371
[10] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
Koul, Nimrita
Manvi, Sunilkumar S.
MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2021, 59 (11-12) : 2353 - 2371

← 1 2 3 4 5 →