Estimating GPU Memory Consumption of Deep Learning Models

被引：79

作者：

Gao, Yanjie ^{[1
]}

Liu, Yu ^{[2
]}

Zhang, Hongyu ^{[3
]}

Li, Zhengxian ^{[1
]}

Zhu, Yonghao ^{[1
]}

Lin, Haoxiang ^{[1
]}

Yang, Mao ^{[1
]}

机构：

[1] Microsoft Res, Beijing, Peoples R China

[2] Natl Univ Singapore, Microsoft Res, Singapore, Singapore

[3] Univ Newcastle, Callaghan, NSW, Australia

来源：

PROCEEDINGS OF THE 28TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '20) | 2020年

关键词：

deep learning; memory consumption; estimation model; program analysis; BACKPROPAGATION;

D O I：

10.1145/3368089.3417050

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Deep learning (DL) has been increasingly adopted by a variety of software-intensive systems. Developers mainly use GPUs to accelerate the training, testing, and deployment of DL models. However, the GPU memory consumed by a DL model is often unknown to them before the DL job executes. Therefore, an improper choice of neural architecture or hyperparameters can cause such a job to run out of the limited GPU memory and fail. Our recent empirical study has found that many DL job failures are due to the exhaustion of GPU memory. This leads to a horrendous waste of computing resources and a significant reduction in development productivity. In this paper, we propose DNNMem, an accurate estimation tool for GPU memory consumption of DL models. DNNMem employs an analytic estimation approach to systematically calculate the memory consumption of both the computation graph and the DL framework runtime. We have evaluated DNNMem on 5 real-world representative models with different hyperparameters under 3 mainstream frameworks (TensorFlow, PyTorch, and MXNet). Our extensive experiments show that DNNMem is effective in estimating GPU memory consumption.

引用

页码：1342 / 1352

页数：11

共 51 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

Albert E, 2010, ACM SIGPLAN NOTICES, V45, P121

[3]

AlexWang Amanpreet Singh, 2019, P ICLR

[4] BACKPROPAGATION AND STOCHASTIC GRADIENT DESCENT METHOD [J].

AMARI, S .

NEUROCOMPUTING, 1993, 5 (4-5) :185-196

[5]

Amazon, 2019, AM SAGEMAKER

[6]

[Anonymous], 2018, Everything you need to know about unified memory

[7]

[Anonymous], 2015, ACS SYM SER

[8]

[Anonymous], 2018, CoRR abs/1802.05799

[9]

[Anonymous], 2017, ARXIV160507678

[10]

[Anonymous], 2012, Neural Netw. Mach. Learn.

← 1 2 3 4 5 6 →