A Hybrid Approach for Word Alignment in English-Hindi Parallel Corpora with Scarce Resources

被引：4

作者：

Srivastava, Jyoti ^{[1
]}

Sanyal, Sudip ^{[1
]}

机构：

[1] Indian Inst Informat Technol, Allahabad, Uttar Pradesh, India

来源：

2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012) | 2012年

关键词：

Word alignment; Statistical Machine Translation; POS tagger; Scarce resources;

D O I：

10.1109/IALP.2012.13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents an approach which improves the performance of the word alignment with scarce resources for English-Hindi language pair. We obtain an improvement in the performance of IBM Model 1-2 algorithm by applying part of speech (POS) tag prior to the computation of word alignment probability. This paper demonstrates the increase of precision, recall and F-measure by approximately 15%, 11%, 14% respectively and reduction in Alignment Error Rate (AER) by approximately 14% with IBM Model 1. Similarly it shows an increase of precision, recall and F-measure by approximately 6%, 6% and 6% respectively and reduction in Alignment Error Rate (AER) by approximately 6% with IBM Model 2. Experiments of this paper are based on TDIL corpus.

引用

页码：185 / 188

页数：4

共 14 条

[1]

[Anonymous], STAT APPROACH FACTOR

[2]

[Anonymous], P 21 INT C COMP LING

[3]

[Anonymous], P EAMT 2009 13 ANN M

[4]

[Anonymous], 2010, Statistical Machine Translation

[5]

[Anonymous], 1996, P 16 C COMP LING, DOI [10.3115/993268.993313, DOI 10.3115/993268.993313]

[6]

[Anonymous], P ACL 2005 WORKSH BU

[7]

Aswani Niraj., 2005, Proceedings of the ACL Workshop on Building and Using Parallel Texts - ParaText'05, P57, DOI DOI 10.3115/1654449.1654458

[8]

Brown P. F., 1993, Computational Linguistics, V19, P263

[9]

Fraser Alexander., 2007, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), P51

[10]

Gale WilliamA., 1991, PROC DARPA WORKSHOP, P152

← 1 2 →