Non-Fluent Synthetic Target-Language Data Improve Neural Machine Translation

被引:1
|
作者
Sanchez-Cartagena, Victor M. [1 ]
Espla-Gomis, Miquel [1 ]
Perez-Ortiz, Juan Antonio [1 ]
Sanchez-Martinez, Felipe [1 ]
机构
[1] Univ Alacant, Dept Llenguatges & Sistemes Informat, Valencia 03690, Spain
关键词
Data augmentation; low-resource languages; machine translation; multi-task learning;
D O I
10.1109/TPAMI.2023.3333949
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When the amount of parallel sentences available to train a neural machine translation is scarce, a common practice is to generate new synthetic training samples from them. A number of approaches have been proposed to produce synthetic parallel sentences that are similar to those in the parallel data available. These approaches work under the assumption that non-fluent target-side synthetic training samples can be harmful and may deteriorate translation performance. Even so, in this paper we demonstrate that synthetic training samples with non-fluent target sentences can improve translation performance if they are used in a multilingual machine translation framework as if they were sentences in another language. We conducted experiments on ten low-resource and four high-resource translation tasks and found out that this simple approach consistently improves translation performance as compared to state-of-the-art methods for generating synthetic training samples similar to those found in corpora. Furthermore, this improvement is independent of the size of the original training corpus, the resulting systems are much more robust against domain shift and produce less hallucinations.
引用
收藏
页码:837 / 850
页数:14
相关论文
共 50 条
  • [21] Interaction and grammar in aphasia: A comparison of conversation and language testing in a non-fluent speaker
    Beeke, S
    Wilkinson, R
    Maxim, J
    BRAIN AND LANGUAGE, 2002, 83 (01) : 190 - 192
  • [22] Language specific, non-fluent aphasia on a bilingual patient with progressive supranuclear palsy
    Agarwal, Pinky
    Griffith, Alida
    NEUROLOGY, 2008, 70 (11) : A182 - A183
  • [23] Tau deposition in non-fluent primary progressive aphasia follows the language network
    Pascual, Belen
    Funk, Quentin
    Rockers, Elijah
    Pal, Neha
    Fregonara, Paolo Zanotti
    Yu, Meixiang
    Karmonik, Christof
    Spann, Bryan
    Schulz, Paul
    Masdeu, Joseph
    JOURNAL OF NUCLEAR MEDICINE, 2017, 58
  • [24] When Idioti (Idiotic) Becomes "Fluffy": Translation Students and the Avoidance of Target-language Cognates
    Malkiel, Brenda
    META, 2009, 54 (02) : 309 - 325
  • [25] Melodic intonation therapy may improve repetition in non-fluent aphasia after stroke
    Huang, Yi-Ai
    Wang, Ya-Hui
    Hou, Wen-Hsuan
    Kang, Yi-No
    PSYCHOGERIATRICS, 2021, 21 (05) : 850 - 851
  • [26] Exchanging the Target-Language in Existing, Non-Metamodel-Based Compilers
    Weber, Dorian
    Scheidgen, Markus
    Fischer, Joachim
    SYSTEM ANALYSIS AND MODELING: TECHNOLOGY-SPECIFIC ASPECTS OF MODELS, 2016, 9959 : 196 - 210
  • [27] Bi-Directional Neural Machine Translation with Synthetic Parallel Data
    Niu, Xing
    Denkowski, Michael
    Carpuat, Marine
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 84 - 91
  • [28] Synthetic data with neural machine translation for automatic correction in arabic grammar
    Solyman, Aiman
    Zhenyu, Wang
    Qian, Tao
    Elhag, Arafat Abdulgader Mohammed
    Toseef, Muhammad
    Aleibeid, Zeinab
    EGYPTIAN INFORMATICS JOURNAL, 2021, 22 (03) : 303 - 315
  • [29] Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition
    Swietlicka, Izabela
    Kuniszyk-Jozkowiak, Wieslawa
    Swietlicki, Michal
    SENSORS, 2022, 22 (01)
  • [30] Patterns of language improvement in adults with non-chronic non-fluent aphasia after specific therapies
    Marini, Andrea
    Caltagirone, Carlo
    Pasqualetti, Patrizio
    Carlomagno, Sergio
    APHASIOLOGY, 2007, 21 (02) : 164 - 186