LightFormer: Light-weight Transformer Using SVD-based Weight Transfer and Parameter Sharing

被引:0
作者
Lu, Xiuqing [1 ]
Zhang, Peng [1 ]
Li, Sunzhu [1 ]
Gan, Guobing [1 ]
Sun, Yueheng [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer has become an important technique for natural language processing tasks with great success. However, it usually requires huge storage space and computational cost, making it difficult to be deployed on resource-constrained edge devices. To compress and accelerate Transformer, we propose LightFormer, which adopts a low-rank factorization initialized by SVD-based weight transfer and parameter sharing. The SVD-based weight transfer can effectively utilize the well-trained Transformer parameter knowledge to speed up the model convergence, and effectively alleviate the low-rank bottleneck problem combined with parameter sharing. We validate this method on machine translation, text summarization, and text classification tasks. Experiments show that on IWSLT'14 De-En and WMT'14 En-De, LightFormer achieves similar performance to the baseline Transformer with 3.8x and 1.8x fewer parameters, and achieves 2.3x speedup and 1.5x speedup respectively, generally out-performing recent light-weight Transformers.
引用
收藏
页码:10323 / 10335
页数:13
相关论文
共 50 条
[31]   Light-Weight Document Image Cleanup Using Perceptual Loss [J].
Dey, Soumyadeep ;
Jawanpuria, Pratik .
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 :238-253
[32]   An experimental investigation on light-weight concrete blocks using vermiculite [J].
Kumar, K. Naveen ;
Vijayan, D. S. ;
Divahar, R. ;
Abirami, R. ;
Nivetha, C. .
MATERIALS TODAY-PROCEEDINGS, 2020, 22 :987-991
[33]   DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection [J].
Erabati, Gopi Krishna ;
Araujo, Helder .
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22
[34]   vid-TLDR: Training Free Token merging for Light-weight Video Transformer [J].
Choi, Joonmyung ;
Lee, Sanghyeok ;
Chu, Jaewon ;
Choi, Minhyuk ;
Kim, Hyunwoo J. .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :18771-18781
[35]   Light-weight Test Oracles for Resource Leaks Based on Finalizers [J].
Dai, Ziying ;
Mao, Xiaoguang ;
Qi, Yuhua ;
Ben, Kerong .
2012 19TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE WORKSHOPS (APSECW), VOL. 2, 2012, :73-79
[36]   Light-weight reference-based compression of FASTQ data [J].
Zhang, Yongpeng ;
Li, Linsen ;
Yang, Yanli ;
Yang, Xiao ;
He, Shan ;
Zhu, Zexuan .
BMC BIOINFORMATICS, 2015, 16
[37]   A Light-Weight Cropland Mapping Model Using Satellite Imagery [J].
Hussain, Maya Haj ;
Abuhani, Diaa Addeen ;
Khan, Jowaria ;
ElMohandes, Mohamed ;
Zualkernan, Imran ;
Ali, Tarig .
SENSORS, 2023, 23 (15)
[38]   Light-weight Recommendation System using Graph Neural Networks [J].
Safar, Seema ;
Jose, Babita Roslind ;
Mathew, Jimson ;
Santhanakrishnan, T. .
2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
[39]   Creating portals using light-weight ontologies: a transformational approach [J].
Sabou, M .
STAIRS 2002, PROCEEDINGS, 2002, 78 :121-122
[40]   Robotic Grasp Detection Using Light-weight CNN Model [J].
Jiang, Yang ;
Li, Xulong ;
Yu, Minghao ;
Bai, Zhongyu .
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, :1034-1038