Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation

被引:1
|
作者
Pang, Jianhui [1 ]
Yang, Baosong [2 ]
Wong, Derek Fai [1 ]
Wan, Yu [2 ]
Liu, Dayiheng [2 ]
Chao, Lidia Sam [1 ]
Xie, Jun [2 ]
机构
[1] Univ Macau, NLP2CT Lab, Macau, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
关键词
All Open Access; Gold;
D O I
10.1162/coli_a_00496
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the effectiveness of techniques such as back-translation and self-supervised objectives, including masked language modeling, causal language modeling, and denoise autoencoding, in improving the performance of machine translation models. However, the manner in which these methods contribute to the success of machine translation tasks and how they can be effectively combined remains an under-researched area. In this study, we carry out a systematic investigation of the effects of these techniques on linguistic properties through the use of probing tasks, including source language comprehension, bilingual word alignment, and translation fluency. We further evaluate the impact of pre-training, back-translation, and multi-task learning on bitexts of varying sizes. Our findings inform the design of more effective pipelines for leveraging monolingual data in extremely low-resource and low-resource machine translation tasks. Experiment results show consistent performance gains in seven translation directions, which provide further support for our conclusions and understanding of the role of monolingual data in machine translation.
引用
收藏
页码:25 / 47
页数:23
相关论文
共 50 条
  • [1] Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data
    Tonja, Atnafu Lambebo
    Kolesnikova, Olga
    Gelbukh, Alexander
    Sidorov, Grigori
    APPLIED SCIENCES-BASEL, 2023, 13 (02):
  • [2] Data Augmentation for Low-Resource Neural Machine Translation
    Fadaee, Marzieh
    Bisazza, Arianna
    Monz, Christof
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 567 - 573
  • [3] Phrase Table Induction Using Monolingual Data for Low-Resource Statistical Machine Translation
    Marie, Benjamin
    Fujita, Atsushi
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (03)
  • [4] Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach
    Sanchez-Cartagena, Victor M.
    Espla-Gomis, Miquel
    Antonio Perez-Ortiz, Juan
    Sanchez-Martinez, Felipe
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8502 - 8516
  • [5] A Survey on Low-Resource Neural Machine Translation
    Wang, Rui
    Tan, Xu
    Luo, Renqian
    Qin, Tao
    Liu, Tie-Yan
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4636 - 4643
  • [6] Transformers for Low-resource Neural Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 459 - 466
  • [7] A Survey on Low-resource Neural Machine Translation
    Li H.-Z.
    Feng C.
    Huang H.-Y.
    Huang, He-Yan (hhy63@bit.edu.cn), 1600, Science Press (47): : 1217 - 1231
  • [8] A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
    Li, Yu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    INFORMATION, 2020, 11 (05)
  • [9] A Hybrid Approach for Improved Low Resource Neural Machine Translation using Monolingual Data
    Abdulmumin, Idris
    Galadanci, Bashir Shehu
    Isa, Abubakar
    Kakudi, Habeebah Adamu
    Sinan, Ismaila Idris
    ENGINEERING LETTERS, 2021, 29 (04) : 1478 - 1493
  • [10] Low-Resource Neural Machine Translation with Neural Episodic Control
    Wu, Nier
    Hou, Hongxu
    Sun, Shuo
    Zheng, Wei
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,