Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data

被引:13
作者
Tonja, Atnafu Lambebo [1 ]
Kolesnikova, Olga [1 ]
Gelbukh, Alexander [1 ]
Sidorov, Grigori [1 ]
机构
[1] Ctr Invest Comp CIC, Inst Politecn Nacl IPN, Mexico City 07738, Mexico
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 02期
关键词
Wolaytta-English NMT; English-Wolaytta NMT; low-resource NMT; self-learning; neural machine translation; monolingual data for low-resource languages;
D O I
10.3390/app13021201
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta-English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta-English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta-English and English-Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta-English and English-Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.
引用
收藏
页数:19
相关论文
共 42 条
[1]  
Aharoni Roee, 2019, ARXIV
[2]   Augmenting Neural Machine Translation through Round-Trip Training Approach [J].
Ahmadnia, Benyamin ;
Dorr, Bonnie J. .
OPEN COMPUTER SCIENCE, 2019, 9 (01) :268-278
[3]  
Arif M., 2022, CEUR WORKSHOP P, P434
[4]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]
[5]   Google Colab CAD4U: Hands-on Cloud Laboratories for Digital Design [J].
Canesche, Michael ;
Braganca, Lucas ;
Neto, Omar Paranaiba Vilela ;
Nacif, Jose A. ;
Ferreira, Ricardo .
2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[6]  
Chowdhury KD, 2018, DEEP LEARNING APPROACHES FOR LOW-RESOURCE NATURAL LANGUAGE PROCESSING (DEEPLO), P33
[7]  
Dalke D., 2012, THESIS ADDIS ABABA U, P571
[8]  
Dione CMB, 2022, LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P6654
[9]  
Fadaee Marzieh, 2017, arXiv
[10]  
Forcada ML, 2017, TRANSL SPACES, V6, P291, DOI 10.1075/ts.6.2.06for