MIGER: Integrating Multi-Instance GPU and Multi-Process Service for Deep Learning Clusters

被引:0
|
作者
Zhang, Bowen [1 ]
Li, Shuxin [1 ]
Li, Zhuozhao [1 ]
机构
[1] Southern Univ Sci & Technol, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning cluster; GPU sharing;
D O I
10.1145/3673038.3673089
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Modern NVIDIA GPUs, known for their powerful computational abilities, have been widely adopted by data centers. These GPUs often use space-sharing techniques, such as Multi-Process Service (MPS) and Multi-Instance GPU (MIG), to run multiple workloads on a GPU concurrently. However, our findings reveal that there are issues such as performance interference and inflexible resource size for these techniques when they are used individually. We present MIGER, a system that leverages both MPS and MIG techniques for online and offline jobs on modern GPUs. MIGER employs a hierarchical scheduling architecture to determine the sizes of MIG partitions, how to co-locate online and offline jobs, and the resource shares of MPS for each job to increase the throughput of offline jobs while guaranteeing the QoS requirements of online jobs. Through extensive real-cluster experiments, MIGER demonstrates a significant improvement in job completion time by 36% and 46.6% compared to the state-of-the-art MIG-based and MPS-based solutions, respectively.
引用
收藏
页码:504 / 513
页数:10
相关论文
共 50 条
  • [1] MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant GPU Clusters
    Li, Baolin
    Patel, Tirthak
    Samsi, Siddharth
    Gadepally, Vijay
    Tiwari, Devesh
    PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022, 2022, : 173 - 189
  • [2] Regularized Instance Embedding for Deep Multi-Instance Learning
    Lin, Yi
    Zhang, Honggang
    APPLIED SCIENCES-BASEL, 2020, 10 (01):
  • [3] Characterizing Multi-Instance GPU for Machine Learning Workloads
    Li, Baolin
    Gadepally, Viiay
    Samsi, Siddharth
    Tiwari, Devesh
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 724 - 731
  • [4] Deep Learning for Multi-instance Biometric Privacy
    Sudhakar, Tanuja
    Gavrilova, Marina
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2021, 12 (01)
  • [5] Safe Process Quitting for GPU Multi-Process Service (MPS)
    Wu, Hao
    Liu, Wei
    Gong, Yifan
    Jin, Jiangming
    2020 IEEE 40TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2020, : 1169 - 1170
  • [6] Deep Multi-Instance Multi-Label Learning for Image Annotation
    Guo, Hai-Feng
    Han, Lixin
    Su, Shoubao
    Sun, Zhou-Bao
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (03)
  • [7] Bag similarity network for deep multi-instance learning
    Wang, Xinggang
    Yan, Yongluan
    Tang, Peng
    Liu, Wenyu
    Guo, Xiaojie
    INFORMATION SCIENCES, 2019, 504 : 578 - 588
  • [8] Scalable Multi-Instance Learning
    Wei, Xiu-Shen
    Wu, Jianxin
    Zhou, Zhi-Hua
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 1037 - 1042
  • [9] Multi-instance multi-label learning
    Zhou, Zhi-Hua
    Zhang, Min-Ling
    Huang, Sheng-Jun
    Li, Yu-Feng
    ARTIFICIAL INTELLIGENCE, 2012, 176 (01) : 2291 - 2320
  • [10] Instance Annotation for Multi-Instance Multi-Label Learning
    Briggs, Forrest
    Fern, Xiaoli Z.
    Raich, Raviv
    Lou, Qi
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2013, 7 (03)