A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources

被引:4
作者
Srivastava, Jyoti [1 ]
Sanyal, Sudip [1 ]
机构
[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India
来源
2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012) | 2012年
关键词
Word alignment; Statistical Machine Translation; POS tagger; Scarce resources;
D O I
10.1109/IALP.2012.13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an approach which improves the performance of the word alignment with scarce resources for English-Hindi language pair. We obtain an improvement in the performance of IBM Model 1-2 algorithm by applying part of speech (POS) tag prior to the computation of word alignment probability. This paper demonstrates the increase of precision, recall and F-measure by approximately 15%, 11%, 14% respectively and reduction in Alignment Error Rate (AER) by approximately 14% with IBM Model 1. Similarly it shows an increase of precision, recall and F-measure by approximately 6%, 6% and 6% respectively and reduction in Alignment Error Rate (AER) by approximately 6% with IBM Model 2. Experiments of this paper are based on TDIL corpus.
引用
收藏
页码:185 / 188
页数:4
相关论文
共 14 条
[1]  
[Anonymous], STAT APPROACH FACTOR
[2]  
[Anonymous], P 21 INT C COMP LING
[3]  
[Anonymous], P EAMT 2009 13 ANN M
[4]  
[Anonymous], 2010, Statistical Machine Translation
[5]  
[Anonymous], 1996, P 16 C COMP LING, DOI [10.3115/993268.993313, DOI 10.3115/993268.993313]
[6]  
[Anonymous], P ACL 2005 WORKSH BU
[7]  
Aswani Niraj., 2005, Proceedings of the ACL Workshop on Building and Using Parallel Texts - ParaText'05, P57, DOI DOI 10.3115/1654449.1654458
[8]  
Brown P. F., 1993, Computational Linguistics, V19, P263
[9]  
Fraser Alexander., 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), P51
[10]  
Gale WilliamA., 1991, PROC DARPA WORKSHOP, P152