Ultron-AutoML: an open-source, distributed, scalable framework for efficient hyper-parameter optimization

被引:1
|
作者
Narayan, Swarnim [1 ]
Krishna, Chepuri Shri [1 ]
Mishra, Varun [1 ]
Rai, Abhinav [1 ]
Rai, Himanshu [1 ]
Bharti, Chandrakant [1 ]
Sodhi, Gursirat Singh [1 ]
Gupta, Ashish [1 ]
Singh, Nitinbalaji [1 ]
机构
[1] Walmart Global Tech India, Catalog Data Sci, Bengaluru, India
关键词
Hyperparameter optimization; Machine Learning; Deep Unsupervised/Semi-supervised/Representation/Self-supervised Learning; SEARCH;
D O I
10.1109/BigData50022.2020.9378071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present Ultron-AutoML, an open-source, distributed framework for efficient hyper-parameter optimization (HPO) of ML models. Considering that hyper-parameter optimization is compute intensive and time-consuming, the framework has been designed for reliability - the ability to successfully complete an HPO Job in a multi-tenant, failure prone environment, as well as efficiency - completing the job with minimum compute cost and wall-clock time. From a user's perspective, the framework emphasizes ease of use and customizability. The user can declaratively specify and execute an HPO Job, while ancillary tasks - containerizing and running the user's scripts, model checkpointing, monitoring progress, parallelization - are handled by the framework. At the same time, the user has complete flexibility in composing the code-base for specifying the ML model training algorithm as well as, optionally, any custom HPO algorithm. The framework supports the creation of data-pipelines to stream batches of shuffled and augmented data from a distributed file system. This comes in handy for training Deep Learning models based on self-supervised, semi-supervised or representation learning algorithms over large training datasets. We demonstrate the framework's reliability and efficiency by running a BERT pre-training job over a large training corpus using pre-emptible GPU compute targets. Despite the inherent unreliability of the underlying compute nodes, the framework is able to complete such long running jobs at 30% of the cost with a marginal increase in wall-clock time. The framework also comes with a service to monitor jobs and ensures reproducibility of any result.
引用
收藏
页码:1584 / 1593
页数:10
相关论文
共 50 条
  • [1] Efficient Hyper-parameter Optimization with Cubic Regularization
    Shen, Zhenqian
    Yang, Hansi
    Li, Yong
    Kwok, James
    Yao, Quanming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] GenPipes: an open-source framework for distributed and scalable genomic analyses
    Bourgey, Mathieu
    Dali, Rola
    Eveleigh, Robert
    Chen, Kuang Chung
    Letourneau, Louis
    Fillon, Joel
    Michaud, Marc
    Caron, Maxime
    Sandoval, Johanna
    Lefebvre, Francois
    Leveque, Gary
    Mercier, Eloi
    Bujold, David
    Marquis, Pascale
    Van, Patrick Tran
    Morais, David Anderson de Lima
    Tremblay, Julien
    Shao, Xiaojian
    Henrion, Edouard
    Gonzalez, Emmanuel
    Quirion, Pierre-Olivier
    Caron, Bryan
    Bourque, Guillaume
    GIGASCIENCE, 2019, 8 (06):
  • [3] An efficient hyper-parameter optimization method for supervised learning
    Shi, Ying
    Qi, Hui
    Qi, Xiaobo
    Mu, Xiaofang
    APPLIED SOFT COMPUTING, 2022, 126
  • [4] A GPU Scheduling Framework to Accelerate Hyper-Parameter Optimization in Deep Learning Clusters
    Son, Jaewon
    Yoo, Yonghyuk
    Kim, Khu-rai
    Kim, Youngjae
    Lee, Kwonyong
    Park, Sungyong
    ELECTRONICS, 2021, 10 (03) : 1 - 15
  • [5] Efficient Federated Learning with Adaptive Client-Side Hyper-Parameter Optimization
    Kundroo, Majid
    Kim, Taehong
    2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 973 - 974
  • [6] Rapid Scalable Distributed Power Flow with Open-Source Implementation
    Dai, Xinliang
    Cai, Yichen
    Jiang, Yuning
    Hagenmeyer, Veit
    IFAC PAPERSONLINE, 2022, 55 (13): : 145 - 150
  • [7] Derivative-Free Optimization with Adaptive Experience for Efficient Hyper-Parameter Tuning
    Hu, Yi-Qi
    Liu, Zelin
    Yang, Hua
    Yu, Yang
    Liu, Yunfeng
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1207 - 1214
  • [8] Distributed digital twins on the open-source OpenTwins framework
    Infante, Sergio
    Robles, Julia
    Martin, Cristian
    Rubio, Bartolome
    Diaz, Manuel
    ADVANCED ENGINEERING INFORMATICS, 2025, 64
  • [9] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
    Nimrita Koul
    Sunilkumar S. Manvi
    Medical & Biological Engineering & Computing, 2021, 59 : 2353 - 2371
  • [10] Framework for classification of cancer gene expression data using Bayesian hyper-parameter optimization
    Koul, Nimrita
    Manvi, Sunilkumar S.
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2021, 59 (11-12) : 2353 - 2371