Simulated Multiple Reference Training Improves Low-Resource Machine Translation

被引:0
|
作者
Khayrallah, Huda [1 ]
Thompson, Brian [1 ]
Post, Matt [1 ]
Koehn, Philipp [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce Simulated Multiple Reference Training (SMRT), a novel MT training method that approximates the full space of possible translations by sampling a paraphrase of the reference sentence from a paraphraser and training the MT model to predict the paraphraser's distribution over possible tokens. We demonstrate the effectiveness of SMRT in low-resource settings when translating to English, with improvements of 1.2 to 7.0 BLEU. We also find SMRT is complementary to back-translation.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 50 条
  • [31] Low-Resource Machine Translation with Different Granularity Image Features
    Tayir, Turghun
    Li, Lin
    Maimaiti, Mieradilijiang
    Muhtar, Yusnur
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 260 - 273
  • [32] Automatic Machine Translation of Poetry and a Low-Resource Language Pair
    Dunder, I
    Seljan, S.
    Pavlovski, M.
    2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020), 2020, : 1034 - 1039
  • [33] Low-Resource Neural Machine Translation: A Systematic Literature Review
    Yazar, Bilge Kagan
    Sahin, Durmus Ozkan
    Kilic, Erdal
    IEEE ACCESS, 2023, 11 : 131775 - 131813
  • [34] Meta-Learning for Low-Resource Neural Machine Translation
    Gu, Jiatao
    Wang, Yong
    Chen, Yun
    Cho, Kyunghyun
    Li, Victor O. K.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3622 - 3631
  • [35] AAVE Corpus Generation and Low-Resource Dialect Machine Translation
    Graves, Eric
    Aswar, Shreyas
    Desai, Rujuta
    Nampelli, Srilekha
    Chakraborty, Sunandan
    Hall, Ted
    PROCEEDINGS OF THE ACM SIGCAS/SIGCHI CONFERENCE ON COMPUTING AND SUSTAINABLE SOCIETIES 2024, COMPASS 2024, 2024, : 50 - 59
  • [36] Boosting the Transformer with the BERT Supervision in Low-Resource Machine Translation
    Yan, Rong
    Li, Jiang
    Su, Xiangdong
    Wang, Xiaoming
    Gao, Guanglai
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [37] Neural Machine Translation of Low-Resource and Similar Languages with Backtranslation
    Przystupa, Michael
    Abdul-Mageed, Muhammad
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 224 - 235
  • [38] Extremely low-resource neural machine translation for Asian languages
    Rubino, Raphael
    Marie, Benjamin
    Dabre, Raj
    Fujita, Atushi
    Utiyama, Masao
    Sumita, Eiichiro
    MACHINE TRANSLATION, 2020, 34 (04) : 347 - 382
  • [39] Introduction to the Special Issue on Machine Translation for Low-Resource Languages
    Liu, Chao-Hong
    Karakanta, Alina
    Tong, Audrey N.
    Aulov, Oleg
    Soboroff, Ian M.
    Washington, Jonathan
    Zhao, Xiaobing
    MACHINE TRANSLATION, 2020, 34 (04) : 247 - 249
  • [40] Revisiting Low-Resource Neural Machine Translation: A Case Study
    Sennrich, Rico
    Zhang, Biao
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 211 - 221