Characterizing Multi-Instance GPU for Machine Learning Workloads

被引:15
作者
Li, Baolin [1 ]
Gadepally, Viiay [2 ]
Samsi, Siddharth [2 ]
Tiwari, Devesh [1 ]
机构
[1] Northeastern Univ, Boston, MA 02115 USA
[2] MIT, Lincoln Lab, 244 Wood St, Lexington, MA 02173 USA
来源
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022) | 2022年
关键词
Machine Learning; GPU; Characterization;
D O I
10.1109/IPDPSW55747.2022.00124
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As machine learning (ML) becomes more and more popular, datacenter operators use hardware accelerators such as GPUs to tackle the high computation demand of ML workloads. However, recent studies show that user-submitted jobs often underutilize the GPU streaming multiprocessor (SM) cores, resulting in hardware resource wastage. Motivated by this observation, GPU vendors have released software and hardware support for GPU resource sharing, for example, the NVIDIA Multi-Instance GPU (MIG) technique on A100 Tensor Core GPUs. In this work, we use several state-of-the-art deep learning (DL) models from various application areas to characterize the performance and energy consumption of the A100 GPU MIG mode operation. Our characterization reveals valuable insights into operating a MIG-enabled GPU datacenter.
引用
收藏
页码:724 / 731
页数:8
相关论文
共 53 条
[1]   BATCH: Machine Learning Inference Serving on Serverless Platforms with Adaptive Batching [J].
Ali, Ahsan ;
Pinciroli, Riccardo ;
Yan, Feng ;
Smirni, Evgenia .
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[2]  
Amodei D, 2016, PR MACH LEARN RES, V48
[3]  
[Anonymous], 2022, 19 USENIX S NETW SYS
[4]  
Azevedo D., 2010, GREEN GRID, V32
[5]   Balancing Efficiency and Fairness in Heterogeneous GPU Clusters for Deep Learning [J].
Chaudhary, Shubham ;
Ramjee, Ramachandran ;
Sivathanu, Muthian ;
Kwatra, Nipun ;
Viswanatha, Srinidhi .
PROCEEDINGS OF THE FIFTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS'20), 2020,
[6]  
Chilimbi Trishul, 2014, Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI '14). OSDI '14, P571
[7]  
Chuangang Ren, 2012, 2012 IEEE 20th International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), P391, DOI 10.1109/MASCOTS.2012.51
[8]  
Crankshaw D, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P613
[9]   GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server [J].
Cui, Henggang ;
Zhang, Hao ;
Ganger, Gregory R. ;
Gibbons, Phillip B. ;
Xing, Eric P. .
PROCEEDINGS OF THE ELEVENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, (EUROSYS 2016), 2016,
[10]   Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction [J].
Cui, Weihao ;
Zhao, Han ;
Chen, Quan ;
Zheng, Ningxin ;
Leng, Jingwen ;
Zhao, Jieru ;
Song, Zhuo ;
Ma, Tao ;
Yang, Yong ;
Li, Chao ;
Guo, Minyi .
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,