Unsupervised word-sense disambiguation using bilingual comparable corpora

被引:6
作者
Kaji, H [1 ]
Morimoto, Y [1 ]
机构
[1] Hitachi Ltd, Cent Res Lab, Kokubunji, Tokyo 1858601, Japan
关键词
word-sense disambiguation; unsupervised learning; comparable corpora;
D O I
10.1093/ietisy/E88-D.2.289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
引用
收藏
页码:289 / 301
页数:13
相关论文
共 50 条
[41]   Effect of Supervised Sense Disambiguation Model Using Machine Learning Technique and Word Embedding in Word Sense Disambiguation [J].
Mahajan, Rupesh ;
Kokane, Chandrakant ;
Pathak, Kishor ;
Kodmelwar, Manohar ;
Wagh, Kapil ;
Bhandari, Mahesh .
JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (01) :436-443
[42]   Word Sense Disambiguation Using an Evolutionary Approach [J].
Menai, Mohamed El Bachir .
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (02) :155-169
[43]   A semantic matching energy function for learning with multi-relational data Application to word-sense disambiguation [J].
Bordes, Antoine ;
Glorot, Xavier ;
Weston, Jason ;
Bengio, Yoshua .
MACHINE LEARNING, 2014, 94 (02) :233-259
[44]   WORD SENSE DISAMBIGUATION USING WORD ONTOLOGY AND CONCEPT DISTRIBUTION [J].
Hung, Jason C. ;
Yang, Che-Yu .
JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2009, 32 (02) :153-168
[45]   Automatic Methods for the Extension of a Bilingual Dictionary using Comparable Corpora [J].
Rosner, Michael ;
Sultana, Kurt .
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, :3790-3797
[46]   Sense Space for Word Sense Disambiguation [J].
Kang, Myung Yun ;
Min, Tae Hong ;
Lee, Jae Sung .
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, :669-672
[47]   An improved word sense disambiguation method for Chinese full-words based on unsupervised learning [J].
Li X. ;
Liu G.-H. ;
Zhang D.-M. .
Zidonghua Xuebao/ Acta Automatica Sinica, 2010, 36 (01) :184-187
[48]   ADOPTING DOMAIN KNOWLEDGE TO ENHANCE LEXICAL CHAIN FOR UNSUPERVISED WORD SENSE DISAMBIGUATION [J].
Lee, Wei Jan ;
Mit, Edwin .
PROCEEDINGS OF THE 2011 3RD INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGY AND ENGINEERING (ICSTE 2011), 2011, :13-18
[49]   Combining Lexical Stability and Improved Lexical Chain for Unsupervised Word Sense Disambiguation [J].
Chen, Junpeng ;
Liu, Juan ;
Yu, Wei ;
Wu, Peng .
2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING: KAM 2009, VOL 1, 2009, :430-+
[50]   Practice of Word Sense Disambiguation [J].
Sieminski, Andrzej .
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 :159-169