Statistical-based system combination approach to gain advantages over different machine translation systems

被引:14
作者
Banik, Debajyoty [1 ,2 ]
Ekbal, Asif [1 ,2 ]
Bhattacharyya, Pushpak [1 ,2 ]
Bhattacharyya, Siddhartha [3 ,4 ]
Platos, Jan [3 ]
机构
[1] Dept Comp Sci & Engn, Patna, Bihar, India
[2] Indian Inst Technol Patna, Patna, Bihar, India
[3] VSB Tech Univ Ostrava, Fac Elect Engn & Comp Sci, Ostrava, Czech Republic
[4] RCC Inst Informat Technol, Kolkata, India
关键词
System combination method; Machine translation; Statistical approach; Neural machine translation (NMT); Neural network; Hierarchical machine translation (Hiero) systems; Phrase-based statistical machine translation (PBSMT);
D O I
10.1016/j.heliyon.2019.e02504
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Every machine translation system has some advantages. We propose an improved statistical system combination approach to achieve the advantages of existing machine translation systems. The primary task is to score all the phrases of the outputs of different machine translation systems selected for combination. Three steps are involved in the proposed statistical system combination approach, viz., alignment, decoding, and scoring. Pair alignment is done in the first step to prevent duplication so that only a single phrase is chosen from various phrases containing the same information. Thus the alignment and scoring strategy are implemented in our approach. Hypotheses are built in the second step. In the third step, we calculate the scores for all the hypotheses. The hypothesis with the highest score is chosen as the final translated output. Wrong scoring can mislead to identify the best part from different systems. It may be noted that a particular phrase may appear in various ways in different translations. To resolve the challenges, we incorporate WordNet in the alignment phase and word2vec in the scoring phase along with the existing factors. We find that the system combination model using WordNet and word2vec injection improves the machine translation accuracy. In this work, we have merged three systems viz., Hierarchical machine translation system, Bing Microsoft Translate, and Google Translate. The broad tests of translation on eight language pairs with benchmark datasets demonstrate that the proposed system achieves better quality than the individual systems and the state-of-the-art system combination models.
引用
收藏
页数:9
相关论文
共 77 条
[1]  
[Anonymous], COVERAGE EMBEDDING M
[2]  
[Anonymous], 2006, P ASS MACHINE TRANSL
[3]  
[Anonymous], P 2003 C N AM CHAPT
[4]  
[Anonymous], P JOINT C 47 ANN M A
[5]  
[Anonymous], 2016, MULTIWAY MULTILINGUA
[6]  
[Anonymous], 2014, 1410 ARXIV
[7]  
[Anonymous], P 45 ANN M ASS COMP
[8]  
[Anonymous], ARXIV161001108
[9]  
[Anonymous], 2009, PRAGUE B MATH LINGUI
[10]  
[Anonymous], 9 ANN C INT SPEECH C