An unsupervised & statistical word sense tagging using bilingual sources

被引:0
作者
Oliveira, F [1 ]
Wong, F [1 ]
Li, YP [1 ]
机构
[1] Univ Macau, Fac Sci & Technol, Macao, Peoples R China
来源
Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9 | 2005年
关键词
word sense tagging; machine translation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an approach for choosing the correct translation of an ambiguous word in a given sentence. An unsupervised learning is applied and a non-aligned bilingual Portuguese to Chinese bilingual corpus is used in disambiguating word senses. The identification of the relationships between words is done by considering its surrounding words and their relative distance to tackle syntactical relationships. All the related words are then translated to the target language in finding out the correct senses of ambiguous words. The selection is based on a statistical and a mathematical model by assigning a score to each of the sense identified previously. After all the senses discovered, its semantic and syntactical information are converted into a set of rules and stored in the database for later use in the disambiguation process. Preliminary experiment results of the proposed method shows an improvement of 6% in assigning correctly the corresponding translation over the baseline method.
引用
收藏
页码:3749 / 3754
页数:6
相关论文
共 11 条
[1]  
[Anonymous], 1991, P 29 ANN M ASS COMP, DOI DOI 10.3115/981344.981378
[2]  
Dagan I., 1994, Computational Linguistics, V20, P563
[3]   A METHOD FOR DISAMBIGUATING WORD SENSES IN A LARGE CORPUS [J].
GALE, WA ;
CHURCH, KW ;
YAROWSKY, D .
COMPUTERS AND THE HUMANITIES, 1992, 26 (5-6) :415-439
[4]  
IDE N, 1994, COMPUTATIONAL LINGUI, V20, P563
[5]  
Kaji Hiroyuki, 2002, P 19 INT C COMP LING, P411
[6]  
KIKUI G, 1998, P COLING ACL 98, P670
[7]  
KIKUI G, 1999, P ACL 99 WORKSHOP UN
[8]  
SCHUTZE H, 1998, COMPUTATIONAL LINGUI, V24, P125
[9]  
Tanaka Kumiko, 1996, P 16 INT C COMP LING, V2, P580
[10]  
Wong F., 2004, P 20 INT C COMP LING, P1079