Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

被引:1
|
作者
Sanchez-Cartagena, Victor M. [1 ]
Espla-Gomis, Miquel [1 ]
Perez-Ortiz, Juan Antonio [1 ]
Sanchez-Martinez, Felipe [1 ]
机构
[1] Univ Alacant, Dept Llenguatges & Sistemes Informat, Valencia 03690, Spain
关键词
Data augmentation; low-resource languages; machine translation; multi-task learning;
D O I
10.1109/TPAMI.2023.3333949
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them. A number of approaches have been proposed to produce synthetic parallel sentences that are similar to those in the parallel data available. These approaches work under the assumption that non-fluent target-side synthetic training samples can be harmful and may deteriorate translation performance. Even so, in this paper we demonstrate that synthetic training samples with non-fluent target sentences can improve translation performance if they are used in a multilingual machine translation framework as if they were sentences in another language. We conducted experiments on ten low-resource and four high-resource translation tasks and found out that this simple approach consistently improves translation performance as compared to state-of-the-art methods for generating synthetic training samples similar to those found in corpora. Furthermore, this improvement is independent of the size of the original training corpus, the resulting systems are much more robust against domain shift and produce less hallucinations.
引用
收藏
页码:837 / 850
页数:14
相关论文
共 50 条
  • [41] Using Neural Machine Translation Methods for Sign Language Translation
    Angelova, Galina
    Avramidis, Eleftherios
    Moeller, Sebastian
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 273 - 284
  • [42] Natural Language to Visualization by Neural Machine Translation
    Luo, Yuyu
    Tang, Nan
    Li, Guoliang
    Tang, Jiawei
    Chai, Chengliang
    Qin, Xuedi
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (01) : 217 - 226
  • [43] On the Language Coverage Bias for Neural Machine Translation
    Wang, Shuo
    Tu, Zhaopeng
    Tan, Zhixing
    Shi, Shuming
    Sun, Maosong
    Liu, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4778 - 4790
  • [44] On integrating a language model into neural machine translation
    Gulcehre, Caglar
    Firat, Orhan
    Xu, Kelvin
    Cho, Kyunghyun
    Bengio, Yoshua
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 137 - 148
  • [45] Multilingual Neural Machine Translation with Language Clustering
    Tan, Xu
    Chen, Jiale
    He, Di
    Xia, Yingce
    Qin, Tao
    Liu, Tie-Yan
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 963 - 973
  • [46] Neural machine translation and the indivisibility of culture and language
    Sanchez-Gijon, Pilar
    FORUM-REVUE INTERNATIONALE D INTERPRETATION ET DE TRADUCTION-INTERNATIONAL JOURNAL OF INTERPRETATION AND TRANSLATION, 2022, 20 (02): : 357 - 367
  • [47] Improving Non-autoregressive Neural Machine Translation with Monolingual Data
    Zhou, Jiawei
    Keung, Phillip
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1893 - 1898
  • [48] Using language for social interaction: Communication mechanisms promote recovery from chronic non-fluent aphasia
    Stahl, Benjamin
    Mohr, Bettina
    Dreyer, Felix R.
    Lucchese, Guglielmo
    Pulvermueller, Friedemann
    CORTEX, 2016, 85 : 90 - 99
  • [49] Does Masked Language Model Pre-training with Artificial Data Improve Low-resource Neural Machine Translation?
    Tamura, Hiroto
    Hirasawa, Tosho
    Kim, Hwichan
    Komachi, Mamoru
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2216 - 2225
  • [50] Controlling Neural Machine Translation Formality with Synthetic Supervision
    Niu, Xing
    Carpuat, Marine
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8568 - 8575