Simulated Multiple Reference Training Improves Low-Resource Machine Translation

被引:0
|
作者
Khayrallah, Huda [1 ]
Thompson, Brian [1 ]
Post, Matt [1 ]
Koehn, Philipp [1 ]
机构
[1] Johns Hopkins Univ, Baltimore, MD 21218 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings. We introduce Simulated Multiple Reference Training (SMRT), a novel MT training method that approximates the full space of possible translations by sampling a paraphrase of the reference sentence from a paraphraser and training the MT model to predict the paraphraser's distribution over possible tokens. We demonstrate the effectiveness of SMRT in low-resource settings when translating to English, with improvements of 1.2 to 7.0 BLEU. We also find SMRT is complementary to back-translation.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 50 条
  • [21] Exploiting multiple correlated modalities can enhance low-resource machine translation quality
    Meetei, Loitongbam Sanayai
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 13137 - 13157
  • [22] Exploiting multiple correlated modalities can enhance low-resource machine translation quality
    Loitongbam Sanayai Meetei
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    Multimedia Tools and Applications, 2024, 83 : 13137 - 13157
  • [23] Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation
    Gupta, Kamal Kumar
    Sen, Sukanta
    Haque, Rejwanul
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Way, Andy
    MACHINE TRANSLATION, 2021,
  • [24] Augmenting training data with syntactic phrasal-segments in low-resource neural machine translation
    Gupta, Kamal Kumar
    Sen, Sukanta
    Haque, Rejwanul
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Way, Andy
    MACHINE TRANSLATION, 2021, 35 (04) : 661 - 685
  • [25] Low-Resource Neural Machine Translation with Neural Episodic Control
    Wu, Nier
    Hou, Hongxu
    Sun, Shuo
    Zheng, Wei
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [26] A Strategy for Referential Problem in Low-Resource Neural Machine Translation
    Ji, Yatu
    Shi, Lei
    Su, Yila
    Ren, Qing-dao-er-ji
    Wu, Nier
    Wang, Hongbin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 321 - 332
  • [27] Introduction to the second issue on machine translation for low-resource languages
    Liu, Chao-Hong
    Karakanta, Alina
    Tong, Audrey N.
    Aulov, Oleg
    Soboroff, Ian M.
    Washington, Jonathan
    Zhao, Xiaobing
    MACHINE TRANSLATION, 2021, 35 (01) : 1 - 2
  • [28] Machine Translation in Low-Resource Languages by an Adversarial Neural Network
    Sun, Mengtao
    Wang, Hao
    Pasquine, Mark
    Hameed, Ibrahim A.
    APPLIED SCIENCES-BASEL, 2021, 11 (22):
  • [29] Language Model Prior for Low-Resource Neural Machine Translation
    Baziotis, Christos
    Haddow, Barry
    Birch, Alexandra
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7622 - 7634
  • [30] Unsupervised Source Hierarchies for Low-Resource Neural Machine Translation
    Currey, Anna
    Heafield, Kenneth
    RELEVANCE OF LINGUISTIC STRUCTURE IN NEURAL ARCHITECTURES FOR NLP, 2018, : 6 - 12