Automatic Dictionary Expansion Using Non-parallel Corpora

被引:3
作者
Rapp, Reinhard [1 ]
Zock, Michael [1 ]
机构
[1] Univ Rovira & Virgili, GRLMC, Tarragona, Spain
来源
ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE | 2010年
关键词
Comparable corpora; Dictionary generation; Multilingual texts; Word translations;
D O I
10.1007/978-3-642-01044-6_29
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatically generating bilingual dictionaries from parallel, manually translated texts is a well established technique that works well in practice. However, parallel texts are a scarce resource. Therefore, it is desirable also to be able to generate dictionaries from pairs of comparable monolingual corpora. For most languages, such corpora are much easier to acquire, and often in considerably larger quantities. In this paper we present the implementation of an algorithm which exploits such corpora with good success. Based on the assumption that the co-occurrence patterns between different languages are related, it expands a small base lexicon. For improved performance, it also realizes a novel interlingua approach. That is, if corpora of more than two languages are available, the translations from one language to another can be determined not only directly, but also indirectly via a pivot language.
引用
收藏
页码:317 / +
页数:2
相关论文
共 12 条
[1]  
[Anonymous], P INT JOINT C NAT LA
[2]  
[Anonymous], 5 ANN WORKSH VER LAR
[3]  
[Anonymous], 2006, P 5 INT C LANGUAGE R
[4]  
ARMSTRONG S, 1998, P 1 INT C LANG RES E, V2, P975
[5]  
Dunning T., 1993, Computational Linguistics, V19, P61
[6]  
Fung Pascale., 1998, P COLING ACL98, P414
[7]  
Koehn P., 2005, P MACHINE TRANSLATIO, P79
[8]   Improving machine translation performance by exploiting non-parallel corpora [J].
Munteanu, DS ;
Marcu, D .
COMPUTATIONAL LINGUISTICS, 2005, 31 (04) :477-504
[9]  
RAPP R, 2007, DATENSTRUKTUREN LING, P231
[10]  
Rapp Reinhard, 1995, 33 ANN M ASS COMPUTA, P320, DOI 10.3115/981658.981709