An empirical study of low-resource neural machine translation of manipuri in multilingual settings

被引:12
作者
Singh, Salam Michael [1 ]
Singh, Thoudam Doren [1 ]
机构
[1] Natl Inst Technol Silchar, Dept Comp Sci & Engn, Silchar 788010, Assam, India
关键词
Neural machine translation; Multilingual neural machine translation for low resource; Cross-lingual embedding; Manipuri;
D O I
10.1007/s00521-022-07337-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine translation requires a large amount of parallel data for a production level of translation quality. This is one of the significant factors behind the lack of machine translation systems for most spoken/written languages. Likewise, Manipuri is a low resource Indian language, and there is very little digital textual available data for the same. In this work, we attempt to address the low resource neural machine translation for Manipuri and English using other Indian languages in a multilingual setup. We train an LSTM based many-to-many multilingual neural machine translation system that is infused with cross-lingual features. Experimental results show that our method improves over the vanilla many-to-many multilingual and bilingual baselines for both Manipuri to/from English translation tasks. Furthermore, our method also improves over the vanilla many-to-many multilingual system for the translation task of all the other Indian languages to/from English. We also examine the generalizability of our multilingual model by evaluating the translation among the language pairs which do not have a direct link via the zero-shot translation and compare it against the pivot-based translation.
引用
收藏
页码:14823 / 14844
页数:22
相关论文
共 63 条
[1]  
Aharoni R, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3874
[2]  
[Anonymous], 2017, Transactions of the Association for Computational Linguistics, DOI [10.1162/tacl_a_00065, DOI 10.1162/TACL_A_00065]
[3]  
[Anonymous], 2016, P C N AM CHAPT ASS C
[4]  
Arivazhagan N, 2019, MASSIVELY MULTILINGU
[5]  
Artetxe M, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P789
[6]   Learning bilingual word embeddings with (almost) no bilingual data [J].
Artetxe, Mikel ;
Labaka, Gorka ;
Agirre, Eneko .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :451-462
[7]  
Artetxe Mikel, 2018, Unsupervised neural machine translation, DOI DOI 10.18653/V1/D18-1399
[8]  
Avazpour R., 2014, International Journal of Economy, Management and Social Sciences, V3, P143
[9]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[10]  
Bandyopadhyay S., 2010, P 1 WORKSHOP S SE AS, P35