Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training

被引:13
|
作者
Choi, Hyeonseong [1 ]
Lee, Jaehwan [1 ]
机构
[1] Korea Aerosp Univ, Sch Elect & Informat Engn, Goyang Si 10540, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 21期
基金
新加坡国家研究基金会;
关键词
deep learning; large-scale model; CUDA Unified Memory; PyTorch;
D O I
10.3390/app112110377
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To achieve high accuracy when performing deep learning, it is necessary to use a large-scale training model. However, due to the limitations of GPU memory, it is difficult to train large-scale training models within a single GPU. NVIDIA introduced a technology called CUDA Unified Memory with CUDA 6 to overcome the limitations of GPU memory by virtually combining GPU memory and CPU memory. In addition, in CUDA 8, memory advise options are introduced to efficiently utilize CUDA Unified Memory. In this work, we propose a newly optimized scheme based on CUDA Unified Memory to efficiently use GPU memory by applying different memory advise to each data type according to access patterns in deep learning training. We apply CUDA Unified Memory technology to PyTorch to see the performance of large-scale learning models through the expanded GPU memory. We conduct comprehensive experiments on how to efficiently utilize Unified Memory by applying memory advises when performing deep learning. As a result, when the data used for deep learning are divided into three types and a memory advise is applied to the data according to the access pattern, the deep learning execution time is reduced by 9.4% compared to the default Unified Memory.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems
    Wang, Zixuan
    Sim, Joonseop
    Lim, Euicheol
    Zhao, Jishen
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 126 - 140
  • [2] GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
    Guo, Cong
    Zhang, Rui
    Xu, Jiale
    Leng, Jingwen
    Liu, Zihan
    Huang, Ziyu
    Guo, Minyi
    Wu, Hao
    Zhao, Shouren
    Zhao, Junping
    Zhang, Ke
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, ASPLOS 2024, VOL 2, 2024, : 450 - 466
  • [3] Resource-efficient Federated Learning for Large-scale Model Training
    Song, Zilin
    Li, Zhengze
    Yuan, Tingting
    Fu, Xiaoming
    PROCEEDINGS OF THE WORKSHOP ON MOBILITY IN THE EVOLVING INTERNET ARCHITECTURE TO BE HELD IN CONJUNCTION WITH MOBICOM 2024, MOBIARCH 2024, 2024, : 43 - 48
  • [4] Large-scale Deep Learning at Baidu
    Yu, Kai
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 2211 - 2211
  • [5] Efficient Learning of Fuzzy Logic Systems for Large-Scale Data Using Deep Learning
    Koklu, Ata
    Guven, Yusuf
    Kumbasar, Tufan
    INTELLIGENT AND FUZZY SYSTEMS, INFUS 2024 CONFERENCE, VOL 1, 2024, 1088 : 406 - 413
  • [6] Distributed Training Large-Scale Deep Architectures
    Zou, Shang-Xuan
    Chen, Chun-Yen
    Wu, Jui-Lin
    Chou, Chun-Nan
    Tsao, Chia-Chin
    Tung, Kuan-Chieh
    Lin, Ting-Wei
    Sung, Cheng-Lung
    Chang, Edward Y.
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2017, 2017, 10604 : 18 - 32
  • [7] Acceleration of Large Deep Learning Training with Hybrid GPU Memory Management of Swapping and Re-computing
    Imai, Haruki
    Le, Tung D.
    Negishi, Yasushi
    Kawachiya, Kiyokuni
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1111 - 1116
  • [8] Toward Optimally Efficient Search With Deep Learning for Large-Scale MIMO Systems
    He, Le
    He, Ke
    Fan, Lisheng
    Lei, Xianfu
    Nallanathan, Arumugam
    Karagiannidis, George K.
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2022, 70 (05) : 3157 - 3168
  • [9] Large-scale Pollen Recognition with Deep Learning
    de Geus, Andre R.
    Barcelos, Celia A. Z.
    Batista, Marcos A.
    da Silva, Sergio F.
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [10] Deep Learning on Large-scale Muticore Clusters
    Sakiyama, Kazumasa
    Kato, Shinpei
    Ishikawa, Yutaka
    Hori, Atsushi
    Monrroy, Abraham
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 314 - 321