Learning Word Alignment Models for Kazakh-English Machine Translation

被引:2
作者
Kartbayev, Amandyk [1 ]
机构
[1] Al Farabi Kazakh Natl Univ, Lab Intelligent Informat Syst, Alma Ata, Kazakhstan
来源
INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, IUKM 2015 | 2015年 / 9376卷
关键词
Word alignment; Kazakh morphology; Word segmentation; Machine translation;
D O I
10.1007/978-3-319-25135-6_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we address to the most essential challenges in the word alignment quality. Word alignment is a widely used phenomenon in the field of machine translation. However, a small research has been dedicated to the revealing of its discrete properties. This paper presents word segmentation, the probability distributions, and the statistical properties of word alignment in the transparent and a real life dataset. The result suggests that there is no single best method for alignment evaluation. For Kazakh-English pair we attempted to improve the phrase tables with the choice of alignment method, which need to be adapted to the requirements in the specific project. Experimental results show that the processed parallel data reduced word alignment error rate and achieved the highest BLEU improvement on the random parallel corpora.
引用
收藏
页码:326 / 335
页数:10
相关论文
共 25 条
  • [1] [Anonymous], 2014 IEEE 8 INT C AP
  • [2] [Anonymous], 2010, P JOINT C CHINESE LA
  • [3] [Anonymous], 7 C INT LANG RES EV
  • [4] Beesley Kenneth R., 2003, Finitestate morphology: Xerox tools and techniques
  • [5] Bisazza A., 2009, P 6 INT WORKSHOP SPO, P129
  • [6] Brown P. F., 1993, Computational Linguistics, V19, P263
  • [7] Clark J. H., 2011, Proceedings of the Meeting of the Association for Computational Linguistics: Human Language Technologies, P176
  • [8] Creutz M., 2007, ACM Transactions on Speech and Language Processing (TSLP), V4, P1, DOI [DOI 10.1145/1187415.1187418, 10.1145/1187415.1187418]
  • [9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [10] Denkowski Michael., 2011, P ASS COMP LING STAT, P85