Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

被引：15

作者：

Choi, Hyeonseong ^{[1
]}

Lee, Jaehwan ^{[1
]}

机构：

[1] Korea Aerosp Univ, Sch Elect & Informat Engn, Goyang Si 10540, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 21期

基金：

新加坡国家研究基金会;

关键词：

deep learning; large-scale model; CUDA Unified Memory; PyTorch;

D O I：

10.3390/app112110377

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.

引用

页数：17

共 24 条

[1]

[Anonymous], 2016, BAID EYES DEEP LEARN

[2] OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training [J].

Awan, Ammar Ahmad ;

Chu, Ching-Hsiang ;

Subramoni, Hari ;

Lu, Xiaoyi ;

Panda, Dhabaleswar K. .

2018 IEEE 25TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2018, :143-152

[3] Performance Evaluation of Advanced Features in CUDA Unified Memory [J].

Chien, Steven W. D. ;

Peng, Ivy B. ;

Markidis, Stefano .

PROCEEDINGS OF MCHPC'19: 2019 IEEE/ACM WORKSHOP ON MEMORY CENTRIC HIGH PERFORMANCE COMPUTING (MCHPC), 2019, :50-57

[4]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[5]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[6]

Diamos G., 2016, International Conference on Machine Learning, P2024

[7]

Harris Mark, 2017, Unified Memory for CUDA Beginnersnvidia developer blog

[8] Deep Networks with Stochastic Depth [J].

Huang, Gao ;

Sun, Yu ;

Liu, Zhuang ;

Sedra, Daniel ;

Weinberger, Kilian Q. .

COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 :646-661

[9]

Huang YP, 2019, ADV NEUR IN, V32

[10] GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distributed DNN Training [J].

Jain, Arpan ;

Awan, Ammar Ahmad ;

Aljuhani, Asmaa M. ;

Hashmi, Jahanzeb Maqbool ;

Anthony, Quentin G. ;

Subramoni, Hari ;

Panda, Dhableswar K. ;

Machiraju, Raghu ;

Parwani, Anil .

PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,

← 1 2 3 →