Enhanced Scheduling of AI Applications in Multi-Tenant Cloud Using Genetic Optimizations

被引:1
|
作者
Kwon, Seokmin [1 ]
Bahn, Hyokyung [1 ]
机构
[1] Ewha Womans Univ, Dept Comp Engn, Seoul 03760, South Korea
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 11期
关键词
task scheduling; artificial intelligence; machine learning; cloud; genetic algorithm;
D O I
10.3390/app14114697
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The artificial intelligence (AI) industry is increasingly integrating with diverse sectors such as smart logistics, FinTech, entertainment, and cloud computing. This expansion has led to the coexistence of heterogeneous applications within multi-tenant systems, presenting significant scheduling challenges. This paper addresses these challenges by exploring the scheduling of various machine learning workloads in large-scale, multi-tenant cloud systems that utilize heterogeneous GPUs. Traditional scheduling strategies often struggle to achieve satisfactory results due to low GPU utilization in these complex environments. To address this issue, we propose a novel scheduling approach that employs a genetic optimization technique, implemented within a process-oriented discrete-event simulation framework, to effectively orchestrate various machine learning tasks. We evaluate our approach using workload traces from Alibaba's MLaaS cluster with over 6000 heterogeneous GPUs. The results show that our scheduling improves GPU utilization by 12.8% compared to Round-Robin scheduling, demonstrating the effectiveness of the solution in optimizing cloud-based GPU scheduling.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Using DSML for Handling Multi-tenant Evolution in Cloud Applications
    Jumagaliyev, Assylbek
    Whittle, Jon
    Elkhatib, Yehia
    2017 9TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2017, : 272 - 279
  • [2] Workflow Scheduling in Multi-Tenant Cloud Computing Environments
    Rimal, Bhaskar Prasad
    Maier, Martin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (01) : 290 - 304
  • [3] Scheduling multi-tenant cloud workflow tasks with resource reliability
    Xiaoping LI
    Dongyuan PAN
    Yadi WANG
    Rubén RUIZ
    ScienceChina(InformationSciences), 2022, 65 (09) : 127 - 144
  • [4] Scheduling multi-tenant cloud workflow tasks with resource reliability
    Xiaoping Li
    Dongyuan Pan
    Yadi Wang
    Rubén Ruiz
    Science China Information Sciences, 2022, 65
  • [5] Adaptive task scheduling method in multi-tenant cloud computing
    Ramegowda A.
    Agarkhed J.
    Patil S.R.
    International Journal of Information Technology, 2020, 12 (4) : 1093 - 1102
  • [6] Scheduling multi-tenant cloud workflow tasks with resource reliability
    Li, Xiaoping
    Pan, Dongyuan
    Wang, Yadi
    Ruiz, Ruben
    SCIENCE CHINA-INFORMATION SCIENCES, 2022, 65 (09)
  • [7] Multi-tenant SaaS Cloud
    Kulkarni, Gurudatt
    Khatawkar, Prasad
    Shelke, Rupali
    Solanke, Vikas
    Waghmare, Rani
    AFRICON, 2013, 2013,
  • [8] Multi-tenant SaaS Cloud
    Kulkarni, Gurudatt
    Shelke, Rupali
    Palwe, Rajnikant
    Khatawkar, Prasad
    Bhuse, Sadanand
    Bankar, Hemant
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [9] AI-Driven Management of Dynamic Multi-Tenant Cloud Networks
    Mir, Nader F.
    SOUTHEASTCON 2023, 2023, : 716 - 717
  • [10] Scheduling Multi-tenant Cloud Workloads on Accelerator-based Systems
    Sengupta, Dipanjan
    Goswami, Anshuman
    Schwan, Karsten
    Pallavi, Krishna
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 513 - 524