Improving neural machine translation for low-resource Indian languages using rule-based feature extraction

被引:9
作者
Singh, Muskaan [1 ]
Kumar, Ravinder [2 ]
Chana, Inderveer [2 ]
机构
[1] Thapar Inst Engn & Technol, CSED, Language Engn & Machine Learning Res Labs, Patiala, Punjab, India
[2] Thapar Inst Engn & Technol, CSED, Patiala, Punjab, India
关键词
Recurrent neural network; Linguistic feature extraction; Deep learning; Rule-based system; Sanskrit-Hindi translation; POS TAGGER; SANSKRIT;
D O I
10.1007/s00521-020-04990-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Languages help to unite the world socially, culturally and technologically. Different natives communicate in different languages; there is a tremendous requirement for inter-language information translation process to transfer and share information and ideas. Though Sanskrit is an ancient Indo-European language, a significant amount of work for processing the information is required to explore the full potential of this language to open vistas in computational linguistics and computer science domain. In this paper, we have proposed and presented the machine translation system for translating Sanskrit to the Hindi language. The developed technique uses linguistic features from rule-based feed to train neural machine translation system. The work is novel and applicable to any low-resource language with rich morphology. It is a generic system covering various domains with minimal human intervention. The performance analysis of work is performed on automatic and linguistic measures. The results show that proposed and developed approach outperforms earlier work for this language pair.
引用
收藏
页码:1103 / 1122
页数:20
相关论文
共 79 条
[1]  
[Anonymous], 2012, Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
[2]  
BAHADUR P, 2012, INT J ADV COMPUTER S
[3]  
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
[4]  
Baindur M, 2015, SOPHIA ST CROSS-CULT, V12, P1, DOI 10.1007/978-81-322-2358-0
[5]  
BARKADE V, 2010, INT J ENG SCI TECHNO, V2, P5313
[6]  
Belinkov Yonatan., 2017, arXiv preprint arXiv:1711.02173
[7]  
Bengio Y, 2001, ADV NEUR IN, V13, P932
[8]   PANINIAN FRAMEWORK AND ITS APPLICATION TO ANUSARAKA [J].
BHARATI, A ;
CHAITANYA, V ;
SANGAL, R .
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1994, 19 :113-127
[9]  
BHARATI A, 2007, 1 INT SANSKR COMP S
[10]  
BHARATI A, 1995, NATURAL LANGUAGE PRO