AccTFM: An Effective Intra-Layer Model Parallelization Strategy for Training Large-Scale Transformer-Based Models

被引：4

作者：

Zeng, Zihao ^{[1
]}

Liu, Chubo ^{[1
]}

Tang, Zhuo ^{[1
]}

Li, Kenli ^{[1
]}

Li, Keqin ^{[1
,2
]}

机构：

[1] Hunan Univ, Coll Informat Sci & Engn, Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China

[2] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Communication hiding; deep learning; intra-layer model parallelization; quantization; Top-k sparsification; COMMUNICATION;

D O I：

10.1109/TPDS.2022.3187815

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Transformer-based deep neural networks have recently swept the field of natural language processing due to their outstanding performance, and are gradually spreading to more applications such as image/video processing. However, compared with general DNNs, training a sizeable transformer-based model is further time-consuming and memory-hungry. The existing distributed training strategies for general DNNs are not appropriate or can not efficiently handle transformer-based networks. In view of this, we propose an intra-layer model parallelization optimization strategy AccTFM, which introduces a novel fine-grained pipeline execution and hybrid communication compression strategy to overcome the synchronization bottleneck. Specifically, on one hand, it first decouples the inter-layer computation and communication dependencies, and then searches for the optimal partitioning strategy to maximize the overlap of computation and communication. On the other hand, the hybrid communication compression module consists of token-level top-k sparsification and piecewise quantization methods aiming at minimizing communication traffic. Experimental results show that AccTFM accelerates transformer-based DNNs training by up to 2.08x compared to state-of-the-art distributed training techniques.

引用

页码：4326 / 4338

页数：13

共 50 条

[1] An Architecture for Accelerated Large-Scale Inference of Transformer-Based Language Models
Ganiev, Amir
Chapin, Colt
de Andrade, Anderson
Liu, Chen
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 163 - 169
[2] Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Ganesh, Prakhar
Chen, Yao
Lou, Xin
Khan, Mohammad Ali
Yang, Yin
Sajjad, Hassan
Nakov, Preslav
Chen, Deming
Winslett, Marianne
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1061 - 1080
[3] Exploring Effective Approaches on Transformer-Based Neural Models for Multi-clinical Large-Scale Cardiotocogram Data
Hemmi, Kazunari
Shibata, Chihiro
Miyata, Kohei
Alkanan, Mohannad
Miyamoto, Shingo
Imamura, Toshiro
Fukunishi, Hiroaki
Numano, Hirotane
ADVANCES IN DIGITAL HEALTH AND MEDICAL BIOENGINEERING, VOL 1, EHB-2023, 2024, 109 : 439 - 447
[4] Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Zhang, Minjia
He, Yuxiong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
[5] Swin Transformer-Based Multiscale Attention Model for Landslide Extraction From Large-Scale Area
Gao, Mengjie
Chen, Fang
Wang, Lei
Zhao, Huichen
Yu, Bo
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[6] TRANSFORMER IN ACTION: A COMPARATIVE STUDY OF TRANSFORMER-BASED ACOUSTIC MODELS FOR LARGE SCALE SPEECH RECOGNITION APPLICATIONS
Wang, Yongqiang
Shi, Yangyang
Zhang, Frank
Wu, Chunyang
Chan, Julian
Yeh, Ching-Feng
Xiao, Alex
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6778 - 6782
[7] A Large-scale Non-standard English Database and Transformer-based Translation System
Kundu, Arghya
Uyen Trang Nguyen
2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 2472 - 2479
[8] Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification
Gong, Jibing
Teng, Zhiyong
Teng, Qi
Zhang, Hekai
Du, Linfeng
Chen, Shuai
Bhuiyan, Md Zakirul Alam
Li, Jianhua
Liu, Mingsheng
Ma, Hongyuan
IEEE ACCESS, 2020, 8 : 30885 - 30896
[9] Cascaded transformer-based networks for wikipedia large-scale image-caption matching
Messina, Nicola
Coccomini, Davide Alessandro
Esuli, Andrea
Falchi, Fabrizio
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62915 - 62935
[10] UAV Cross-Modal Image Registration: Large-Scale Dataset and Transformer-Based Approach
Xiao, Yun
Liu, Fei
Zhu, Yabin
Li, Chenglong
Wang, Futian
Tang, Jin
ADVANCES IN BRAIN INSPIRED COGNITIVE SYSTEMS, BICS 2023, 2024, 14374 : 166 - 176

← 1 2 3 4 5 →