AccTFM: An Effective Intra-Layer Model Parallelization Strategy for Training Large-Scale Transformer-Based Models

被引:4
|
作者
Zeng, Zihao [1 ]
Liu, Chubo [1 ]
Tang, Zhuo [1 ]
Li, Kenli [1 ]
Li, Keqin [1 ,2 ]
机构
[1] Hunan Univ, Coll Informat Sci & Engn, Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China
[2] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
基金
中国国家自然科学基金;
关键词
Communication hiding; deep learning; intra-layer model parallelization; quantization; Top-k sparsification; COMMUNICATION;
D O I
10.1109/TPDS.2022.3187815
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Transformer-based deep neural networks have recently swept the field of natural language processing due to their outstanding performance, and are gradually spreading to more applications such as image/video processing. However, compared with general DNNs, training a sizeable transformer-based model is further time-consuming and memory-hungry. The existing distributed training strategies for general DNNs are not appropriate or can not efficiently handle transformer-based networks. In view of this, we propose an intra-layer model parallelization optimization strategy AccTFM, which introduces a novel fine-grained pipeline execution and hybrid communication compression strategy to overcome the synchronization bottleneck. Specifically, on one hand, it first decouples the inter-layer computation and communication dependencies, and then searches for the optimal partitioning strategy to maximize the overlap of computation and communication. On the other hand, the hybrid communication compression module consists of token-level top-k sparsification and piecewise quantization methods aiming at minimizing communication traffic. Experimental results show that AccTFM accelerates transformer-based DNNs training by up to 2.08x compared to state-of-the-art distributed training techniques.
引用
收藏
页码:4326 / 4338
页数:13
相关论文
共 50 条
  • [1] An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models
    Ganiev, Amir
    Chapin, Colt
    de Andrade, Anderson
    Liu, Chen
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 163 - 169
  • [2] Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
    Ganesh, Prakhar
    Chen, Yao
    Lou, Xin
    Khan, Mohammad Ali
    Yang, Yin
    Sajjad, Hassan
    Nakov, Preslav
    Chen, Deming
    Winslett, Marianne
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1061 - 1080
  • [3] Exploring Effective Approaches on Transformer-Based Neural Models for Multi-clinical Large-Scale Cardiotocogram Data
    Hemmi, Kazunari
    Shibata, Chihiro
    Miyata, Kohei
    Alkanan, Mohannad
    Miyamoto, Shingo
    Imamura, Toshiro
    Fukunishi, Hiroaki
    Numano, Hirotane
    ADVANCES IN DIGITAL HEALTH AND MEDICAL BIOENGINEERING, VOL 1, EHB-2023, 2024, 109 : 439 - 447
  • [4] Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
    Zhang, Minjia
    He, Yuxiong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
  • [5] Swin Transformer-Based Multiscale Attention Model for Landslide Extraction From Large-Scale Area
    Gao, Mengjie
    Chen, Fang
    Wang, Lei
    Zhao, Huichen
    Yu, Bo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [6] TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS
    Wang, Yongqiang
    Shi, Yangyang
    Zhang, Frank
    Wu, Chunyang
    Chan, Julian
    Yeh, Ching-Feng
    Xiao, Alex
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6778 - 6782
  • [7] A Large-scale Non-standard English Database and Transformer-based Translation System
    Kundu, Arghya
    Uyen Trang Nguyen
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2472 - 2479
  • [8] Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification
    Gong, Jibing
    Teng, Zhiyong
    Teng, Qi
    Zhang, Hekai
    Du, Linfeng
    Chen, Shuai
    Bhuiyan, Md Zakirul Alam
    Li, Jianhua
    Liu, Mingsheng
    Ma, Hongyuan
    IEEE ACCESS, 2020, 8 : 30885 - 30896
  • [9] Cascaded transformer-based networks for wikipedia large-scale image-caption matching
    Messina, Nicola
    Coccomini, Davide Alessandro
    Esuli, Andrea
    Falchi, Fabrizio
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62915 - 62935
  • [10] UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach
    Xiao, Yun
    Liu, Fei
    Zhu, Yabin
    Li, Chenglong
    Wang, Futian
    Tang, Jin
    ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 166 - 176