Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

被引:0
|
作者
Choi, Gyu-Hyeon [1 ]
Shin, Jong-Hun [2 ]
Kim, Young-Kil [2 ]
机构
[1] Korea Univ Sci & Technol UST, Daejeon, South Korea
[2] Elect & Telecommun Res Inst ETRI, Gwangju, South Korea
关键词
Neural Machine Translation; Multi-Source Translation; Synthetic; Corpus Extension; Low-Resource;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In machine translation, we often try to collect resources to improve performance. However, most of the language pairs, such as Korean-Arabic and Korean-Vietnamese, do not have enough resources to train machine translation systems. In this paper, we propose the use of synthetic methods for extending a low-resource corpus and apply it to a multi-source neural machine translation model. We showed the improvement of machine translation performance through corpus extension using the synthetic method. We specifically focused on how to create source sentences that can make better target sentences, including the use of synthetic methods. We found that the corpus extension could also improve the performance of multi-source neural machine translation. We showed the corpus extension and multi-source model to be efficient methods for a low-resource language pair. Furthermore, when both methods were used together, we found better machine translation performance.
引用
收藏
页码:900 / 904
页数:5
相关论文
共 50 条
  • [21] Transformers for Low-resource Neural Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 459 - 466
  • [22] Neighbors helping the poor: improving low-resource machine translation using related languages
    Pourdamghani, Nima
    Knight, Kevin
    MACHINE TRANSLATION, 2019, 33 (03) : 239 - 258
  • [23] Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages
    Hai-Long Trieu
    Duc-Vu Tran
    Ittoo, Ashwin
    Le-Minh Nguyen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (03)
  • [24] Multi-Source Neural Model for Machine Translation of Agglutinative Language
    Pan, Yirong
    Li, Xiao
    Yang, Yating
    Dong, Rui
    FUTURE INTERNET, 2020, 12 (06):
  • [25] Morpheme-Based Neural Machine Translation Models for Low-Resource Fusion Languages
    Gezmu, Andargachew Mekonnen
    Nuenberger, Andreas
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (09)
  • [26] Neural machine translation of low-resource languages using SMT phrase pair injection
    Sen, Sukanta
    Hasanuzzaman, Mohammed
    Ekbal, Asif
    Bhattacharyya, Pushpak
    Way, Andy
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 271 - 292
  • [27] Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation
    Imankulova, Aizhan
    Sato, Takayuki
    Komachi, Mamoru
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (02)
  • [28] Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine Translation
    Zaremoodi, Poorya
    Buntine, Wray
    Haffari, Gholamreza
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 656 - 661
  • [29] Multi-Source Syntactic Neural Machine Translation
    Currey, Anna
    Heafield, Kenneth
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2961 - 2966
  • [30] AAVE Corpus Generation and Low-Resource Dialect Machine Translation
    Graves, Eric
    Aswar, Shreyas
    Desai, Rujuta
    Nampelli, Srilekha
    Chakraborty, Sunandan
    Hall, Ted
    PROCEEDINGS OF THE ACM SIGCAS/SIGCHI CONFERENCE ON COMPUTING AND SUSTAINABLE SOCIETIES 2024, COMPASS 2024, 2024, : 50 - 59