A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

被引:0
作者
Hamdi Ahmed Rajeh
Zhiyong Li
Abdullah Mohammed Ayedh
机构
[1] Hunan University,
[2] Central South University,undefined
来源
Arabian Journal for Science and Engineering | 2016年 / 41卷
关键词
Statistical machine translation; Phrase-based translation model; Combinatory Categorial Grammar; Part-of-speech; Factored translation model;
D O I
暂无
中图分类号
学科分类号
摘要
This study addresses the integration and incorporation of rich additional information into the phrase-based approach, aptly called factored translation, which is an extension of phrase-based statistic machine translation (PBSMT). This approach was proven successful when translating English into a morphologically rich language. PBSMT represents the baseline of this work. We extend the phrase-based translation approach by integrating additional linguistic knowledge, namely part-of-speech (POS) tags, to create a factored model. The main contribution of this study is the creation of a new approach for Arabic–English translation via the injection of the factored model into Combinatory Categorial Grammar (CCG) supertags to form an integrated model (POS + CCG). The system was trained on a freely available multi-UN corpus on Arabic–English language pairs. Moses decoder, which is an open-source factored SMT system, was used to integrate these data into the target language model and the target side of the translation model. Results showed improvements to the BLEU automatic score via various high n-gram language models (LMs). The integration of the featured factors (POS + CCG) of the translation has been successfully tested. Overall, the 3-, 5-, 7-, and 9-g LM evaluation with BLEU scores proved that our integrated model performed better than PBSMT. Compared with three other models (PBSMT, POS, and CCG models), the integrated model improved the translation quality by 1.54, 1.29, and 0.21 %, respectively, over the 3-g LM.
引用
收藏
页码:3071 / 3080
页数:9
相关论文
共 12 条
  • [1] Tripathi S.(2010)Approaches to machine translation Ann. Libr. Inf. Stud. 57 388-393
  • [2] Sarkhel J.K.(2008)Syntactically lexicalized phrase-based SMT IEEE Trans. Audio Speech Lang. Process. 16 1260-1273
  • [3] Hassan H.(2005)Character contiguity in N-gram-based word matching: the case for Arabic text searching Inf. Process. Manag. 41 819-827
  • [4] Sima’an K.(2007)Wide-coverage efficient statistical parsing with CCG and log-linear models Comput. Linguist. 33 493-552
  • [5] Way A.(2014)An Arabic CCG approach for determining constituent types from Arabic Treebank J. King Saud Univ. Comput. Info. Sci. 26 441-449
  • [6] Mustafa S.H.(undefined)undefined undefined undefined undefined-undefined
  • [7] Clark S.(undefined)undefined undefined undefined undefined-undefined
  • [8] Curran J.R.(undefined)undefined undefined undefined undefined-undefined
  • [9] El-taher A.I.(undefined)undefined undefined undefined undefined-undefined
  • [10] Bakr H.M.A.(undefined)undefined undefined undefined undefined-undefined