On the application of different evolutionary algorithms to the alignment problem in statistical machine translation

被引:7
作者
Rodriguez, Luis [1 ]
Garcia-Varea, Ismael [1 ]
Gamez, Jose A. [1 ]
机构
[1] Univ Castilla La Mancha, Dept Sistemas Informat SIMD, Albacete 02071, Spain
关键词
evolutionary algorithms; estimation of distribution algorithms; statistical machine translation; statistical alignments;
D O I
10.1016/j.neucom.2007.10.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In statistical machine translation, an alignment defines a mapping between the words in the source and in the target sentence. Alignments are used, on the one hand, to train the statistical models and, on the other, during the decoding process to link the words in the source sentence to the words in the partial hypotheses generated. In both cases, the quality of the alignments is crucial for the success of the translation process. In this paper, we propose several evolutionary algorithms for computing alignments between two sentences in a parallel corpus. This algorithm has been tested on different tasks involving different pair of languages. Specifically, in the two shared tasks proposed in the HLT-NAACL 2003 and in the ACL 2005, the EDA-based algorithm outperforms the best participant systems. In addition, the experiments show that, because of the limitations of the well known statistical alignment models, new improvements in alignments quality could not be achieved by using improved search algorithms only. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:755 / 765
页数:11
相关论文
共 30 条
  • [1] [Anonymous], P HUM LANG TECHN C
  • [2] [Anonymous], 1975, Ann Arbor
  • [3] [Anonymous], 2005, PROC INT C RECONFIGU, DOI DOI 10.1109/PES.2005.1489355
  • [4] Baluja S., 1994, POPULATION BASED INC
  • [5] BERGER AL, 1994, CANDIDE SYSTEM MACHI, P157
  • [6] Brown P. F., 1993, Computational Linguistics, V19, P263
  • [7] BROWN RD, 1997, AUTOMATED DICT EXTRA, P111
  • [8] DIAB M, 2000, ACL 2000 WORKSH WORD, P1
  • [9] GARCIAVAREA I, 1998, P INT C SPOK LANG PR, P1235
  • [10] GERMANN U, 2001, FAST DECODING OPTIMA, P228