A systematic comparison of various statistical alignment models

被引:73
作者
Och, FJ
Ney, H
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[2] RWTH Aachen Univ Technol, Dept Comp Sci, Lehrstuhl Informat 6, D-52056 Aachen, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to performa symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.
引用
收藏
页码:c / 51
页数:33
相关论文
共 36 条
[11]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[12]  
DIAB M, 2000, ACL 2000 WORKSH WORD, P1
[13]   MEASURES OF THE AMOUNT OF ECOLOGIC ASSOCIATION BETWEEN SPECIES [J].
DICE, LR .
ECOLOGY, 1945, 26 (03) :297-302
[14]  
GARCIAVAREA I, 1998, P INT C SPOK LANG PR, P1235
[15]  
Germann U, 2001, 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P228
[16]  
Huang JX, 2000, 38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, P392
[17]  
Ker SJ, 1997, COMPUT LINGUIST, V23, P313
[18]  
Knight K, 1999, COMPUT LINGUIST, V25, P607
[19]  
Knight K., 1999, STAT MT TUTORIAL WOR
[20]  
Melamed I. D., 1998, 9807 I RES COGN SCI