Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

被引:0
|
作者
Choi, Gyu-Hyeon [1 ]
Shin, Jong-Hun [2 ]
Kim, Young-Kil [2 ]
机构
[1] Korea Univ Sci & Technol UST, Daejeon, South Korea
[2] Elect & Telecommun Res Inst ETRI, Gwangju, South Korea
关键词
Neural Machine Translation; Multi-Source Translation; Synthetic; Corpus Extension; Low-Resource;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In machine translation, we often try to collect resources to improve performance. However, most of the language pairs, such as Korean-Arabic and Korean-Vietnamese, do not have enough resources to train machine translation systems. In this paper, we propose the use of synthetic methods for extending a low-resource corpus and apply it to a multi-source neural machine translation model. We showed the improvement of machine translation performance through corpus extension using the synthetic method. We specifically focused on how to create source sentences that can make better target sentences, including the use of synthetic methods. We found that the corpus extension could also improve the performance of multi-source neural machine translation. We showed the corpus extension and multi-source model to be efficient methods for a low-resource language pair. Furthermore, when both methods were used together, we found better machine translation performance.
引用
收藏
页码:900 / 904
页数:5
相关论文
共 50 条
  • [41] Improving Low-Resource Neural Machine Translation With Teacher-Free Knowledge Distillation
    Zhang, Xinlu
    Li, Xiao
    Yang, Yating
    Dong, Rui
    IEEE ACCESS, 2020, 8 : 206638 - 206645
  • [42] LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling
    Mahsuli, Mohammad Mahdi
    Khadivi, Shahram
    Homayounpour, Mohammad Mehdi
    NEURAL PROCESSING LETTERS, 2023, 55 (07) : 9435 - 9466
  • [43] LenM: Improving Low-Resource Neural Machine Translation Using Target Length Modeling
    Mohammad Mahdi Mahsuli
    Shahram Khadivi
    Mohammad Mehdi Homayounpour
    Neural Processing Letters, 2023, 55 : 9435 - 9466
  • [44] Multi-Source Neural Machine Translation With Missing Data
    Nishimura, Yuta
    Sudoh, Katsuhito
    Neubig, Graham
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 569 - 580
  • [45] Multi-Source Neural Machine Translation with Missing Data
    Nishimura, Yuta
    Sudoh, Katsuhito
    Neubig, Graham
    Nakamura, Satoshi
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 92 - 99
  • [46] Ensemble learning for multi-source neural machine translation
    1600, Association for Computational Linguistics, ACL Anthology
  • [47] Improving Multilingual Neural Machine Translation with Auxiliary Source Languages
    Xu, Weijia
    Yin, Yuwei
    Ma, Shuming
    Zhang, Dongdong
    Huang, Haoyang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3029 - 3041
  • [48] A Strategy for Referential Problem in Low-Resource Neural Machine Translation
    Ji, Yatu
    Shi, Lei
    Su, Yila
    Ren, Qing-dao-er-ji
    Wu, Nier
    Wang, Hongbin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 321 - 332
  • [49] Low-Resource Neural Machine Translation: A Systematic Literature Review
    Yazar, Bilge Kagan
    Sahin, Durmus Ozkan
    Kilic, Erdal
    IEEE ACCESS, 2023, 11 : 131775 - 131813
  • [50] Meta-Learning for Low-Resource Neural Machine Translation
    Gu, Jiatao
    Wang, Yong
    Chen, Yun
    Cho, Kyunghyun
    Li, Victor O. K.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3622 - 3631