Accelerating Containerized Machine Learning Workloads

被引:0
|
作者
Tariq, Ali [1 ,2 ]
Cao, Lianjie [2 ]
Ahmed, Faraz [2 ]
Rozner, Eric [1 ]
Sharma, Puneet [2 ]
机构
[1] Univ Colorado, Boulder, CO 80309 USA
[2] Hewlett Packard Labs, Palo Alto, CA 94304 USA
关键词
Machine Learning; Cloud Computing; Resource virtualization and management;
D O I
10.1109/NOMS59830.2024.10575188
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To facilitate various Machine Learning (ML) training and inference tasks, enterprises tend to build large and expensive clusters and share them among different teams for diverse ML workloads. Virtualized platforms (containers/VMs) and schedulers are typically deployed to allow such access, manage heterogeneous resources and schedule ML jobs in these clusters. However, allocating resource budgets for different ML jobs to achieve best performance and cluster resource efficiency remains a significant challenge. This work proposes NEARCHUS to accelerate distributed ML training while ensuring high resource efficiency by using adaptive resource allocation. NEARCHUS automatically identifies potential performance bottlenecks for running jobs and re-allocates resources to provide optimized run-time performance with high resource efficiency. NEARCHUS's resource configuration significantly improves the training speed of individual jobs up to 71.4%-129.1% against state-of-the-art resource schedulers, and reduces job completion and queuing time by 35.6% and 67.8%, respectively.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Accelerating chest pain evaluation with machine learning
    Thangaraj, Phyllis M.
    Khera, Rohan
    EUROPEAN HEART JOURNAL-ACUTE CARDIOVASCULAR CARE, 2023, 12 (11) : 753 - 754
  • [42] Accelerating Additive Design With Probabilistic Machine Learning
    Zhang, Yiming
    Karnati, Sreekar
    Nag, Soumya
    Johnson, Neil
    Khan, Genghis
    Ribic, Brandon
    ASCE-ASME JOURNAL OF RISK AND UNCERTAINTY IN ENGINEERING SYSTEMS PART B-MECHANICAL ENGINEERING, 2022, 8 (01):
  • [43] Accelerating CALYPSO structure prediction with machine learning
    Wei X.-H.
    Zhou C.-B.
    Shen X.-X.
    Liu Y.-Y.
    Tong Q.-C.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2021, 51 (02): : 667 - 676
  • [44] Accelerating materials discovery using machine learning
    Yongfei Juan
    Yongbing Dai
    Yang Yang
    Jiao Zhang
    JournalofMaterialsScience&Technology, 2021, 79 (20) : 178 - 190
  • [45] Accelerating the prediction of stable materials with machine learning
    Sean D. Griesemer
    Yi Xia
    Chris Wolverton
    Nature Computational Science, 2023, 3 : 934 - 945
  • [46] Machine learning in accelerating microsphere formulation development
    Deng, Jiayin
    Ye, Zhuyifan
    Zheng, Wenwen
    Chen, Jian
    Gao, Haoshi
    Wu, Zheng
    Chan, Ging
    Wang, Yongjun
    Cao, Dongsheng
    Wang, Yanqing
    Lee, Simon Ming-Yuen
    Ouyang, Defang
    DRUG DELIVERY AND TRANSLATIONAL RESEARCH, 2023, 13 (04) : 966 - 982
  • [47] Machine learning in accelerating microsphere formulation development
    Jiayin Deng
    Zhuyifan Ye
    Wenwen Zheng
    Jian Chen
    Haoshi Gao
    Zheng Wu
    Ging Chan
    Yongjun Wang
    Dongsheng Cao
    Yanqing Wang
    Simon Ming-Yuen Lee
    Defang Ouyang
    Drug Delivery and Translational Research, 2023, 13 : 966 - 982
  • [48] Accelerating discovery in inorganic chemistry with machine learning
    Kulik, Heather
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [49] Accelerating Machine Learning Inference with Probabilistic Predicates
    Lu, Yao
    Chowdhery, Aakanksha
    Kandula, Srikanth
    Chaudhuri, Surajit
    SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 1493 - 1508
  • [50] Accelerating materials discovery using machine learning
    Juan, Yongfei
    Dai, Yongbing
    Yang, Yang
    Zhang, Jiao
    JOURNAL OF MATERIALS SCIENCE & TECHNOLOGY, 2021, 79 : 178 - 190