Extraction of reordering rules for statistical machine translation

被引:5
作者
Srivastava, Jyoti [1 ]
Sanyal, Sudip [2 ]
Srivastava, Ashish Kumar [1 ]
机构
[1] Marwadi Educ Fdn Grp Inst Rajkot, Rajkot, Gujarat, India
[2] BML Munjal Univ, Fac Comp Sci & Engn, Gurgaon, Haryana, India
关键词
Statistical machine translation; chunk; rule extraction; reordering rules; hybrid machine translation; ENGLISH;
D O I
10.3233/JIFS-179029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word reordering is an important problem for translation between languages which have different structures such as Subject-Verb-Object and Subject-Object-Verb. This paper presents a statistical method for extraction of linguistic rules using chunk to reorder the output of the baseline statistical machine translation system for improved performance. The experiments are based on the TDIL sample tourism corpus of English-Hindi language pair which consists of 1000 sentence pairs out of which 900 sentence pairs are used for training, 50 sentences for tuning and 50 sentences for testing. Finally, the output of the machine translation system, augmented by these rules, is evaluated by using BLEU and NIST metrics. The BLEU score improves by more than 2% in comparison to the baseline SMT system. The results are compared with those of Google translation system which has been trained on a huge corpus. We got a 0.1 point improvement in terms of NIST score, in comparison to Google Translation. Thus, we have comparable results with such a small corpus of 900 sentence pairs for training. This paper is an effort to improve the performance of SMT with a small corpus by using linguistic rules where the rules are automatically generated instead of made by linguist.
引用
收藏
页码:4809 / 4819
页数:11
相关论文
共 38 条
[1]  
AHSAN A, 2010, P 9 C ASS MACH TRANS
[2]  
[Anonymous], P 2006 C EMP METH NA
[3]  
[Anonymous], P 41 ANN M ASS COMP
[4]  
[Anonymous], ANN M ASS COMP LING
[5]  
[Anonymous], P JOINT WORKSH EXPL
[6]  
[Anonymous], 2002, P 40 ANN M ASS COMP
[7]  
[Anonymous], P 2012 JOINT C EMP M
[8]  
[Anonymous], P 2003 C N AM CHAPT
[9]  
[Anonymous], P MT SUMM 8
[10]  
[Anonymous], P HUM LANG TECHN 200