Pruning Residual Networks in Multilingual Neural Machine Translation to Improve Zero-Shot Translation

被引:0
作者
Lu, Kaiwen [1 ,2 ,3 ]
Yang, Yating [1 ,2 ,3 ]
Dong, Rui [1 ,2 ,3 ]
Ma, Bo [1 ,2 ,3 ]
Wang, Lei [1 ,2 ,3 ]
Zhou, Xi [1 ,2 ,3 ]
Ahmat, Ahtamjan [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Xinjiang Tech Inst Phys & Chem, Urumqi 830011, Xinjiang, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Xinjiang Lab Minor Speech & Language Informat Pro, Urumqi 830011, Xinjiang, Peoples R China
来源
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024 | 2025年 / 15361卷
基金
中国国家自然科学基金;
关键词
Zero-shot learning; Multilingualism; Neural machine translation; Residual networks;
D O I
10.1007/978-981-97-9437-9_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A promising advantage of multilingual neural machine translation is that it could directly translate language pairs not included in the supervised training data, i.e., zero-shot translation. However, this technology often suffers from capturing spurious correlations to those language pairs seen in training, which easily falls into the trap of off-target translation and leads to poor translation quality. In this study, we present a novel explanation for the off-target phenomenon and investigate the influence of the encoder component on zero-shot translation. Using this as inspiration, we systematically analyze from a decoupling perspective and reveal how the model erroneously captures spurious correlations. Our results show that there is redundancy in the components of the encoder for zero-shot translation. Pruning the encoder structure significantly improves performance in the zero-shot directions while maintaining the quality of translation in the supervised directions. Extensive experiments conducted on three challenging multilingual datasets demonstrate that our proposed model achieves comparable or even superior performance than the strong multilingual model baseline in zero-shot directions.
引用
收藏
页码:280 / 292
页数:13
相关论文
共 28 条
  • [1] Cheng Y., 2019, Joint Training for Neural Machine Translation. ST, P41, DOI DOI 10.1007/978-981-32-9748-74
  • [2] Dong DX, 2015, PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, P1723
  • [3] Firat Orhan, 2016, P 2016 C EMPIRICAL M
  • [4] Firat Orhan, 2016, P 2016 C N AM CHAPTE, P866
  • [5] Gao P., 2023, Improving zero-shot multilingual neural machine translation by leveraging cross-lingual consistency regularization
  • [6] Gu JT, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P1258
  • [7] Gu Shuhao, 2022, FINDINGS ASS COMPUTA, P6492
  • [8] Ha Thanh-Le, 2016, INT WORKSH SPOK LANG
  • [9] Identity Mappings in Deep Residual Networks
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. COMPUTER VISION - ECCV 2016, PT IV, 2016, 9908 : 630 - 645
  • [10] Hinton Geoffrey E, 2012, arXiv, DOI DOI 10.48550/ARXIV.1207.0580