Unsupervised word-sense disambiguation using bilingual comparable corpora

被引:6
作者
Kaji, H [1 ]
Morimoto, Y [1 ]
机构
[1] Hitachi Ltd, Cent Res Lab, Kokubunji, Tokyo 1858601, Japan
关键词
word-sense disambiguation; unsupervised learning; comparable corpora;
D O I
10.1093/ietisy/E88-D.2.289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
引用
收藏
页码:289 / 301
页数:13
相关论文
共 50 条
  • [31] Arabic word sense disambiguation using sense inventories
    Alian M.
    Awajan A.
    [J]. International Journal of Information Technology, 2023, 15 (2) : 735 - 744
  • [32] State of the art versus classical clustering for unsupervised word sense disambiguation
    Popescu, Marius
    Hristea, Florentina
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2011, 35 (03) : 241 - 264
  • [33] Combining Supervised and Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation
    E. Agirre
    G. Rigau
    L. Padró
    J. Atserias
    [J]. Computers and the Humanities, 2000, 34 : 103 - 108
  • [34] State of the art versus classical clustering for unsupervised word sense disambiguation
    Marius Popescu
    Florentina Hristea
    [J]. Artificial Intelligence Review, 2011, 35 : 241 - 264
  • [35] Selecting Training Data for Unsupervised Domain Adaptation in Word Sense Disambiguation
    Komiya, Kanako
    Sasaki, Minoru
    Shinnou, Hiroyuki
    Kotani, Yoshiyuki
    Okumura, Manabu
    [J]. PRICAI 2016: TRENDS IN ARTIFICIAL INTELLIGENCE, 2016, 9810 : 220 - 232
  • [36] An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
    Ustalov, Dmitry
    Teslenko, Denis
    Panchenko, Alexander
    Chernoskutov, Mikhail
    Biemann, Chris
    Ponzetto, Simone Paolo
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1018 - 1022
  • [37] Combining supervised and unsupervised lexical knowledge methods for word sense disambiguation
    Agirre, E
    Rigau, G
    Padró, L
    Atserias, J
    [J]. COMPUTERS AND THE HUMANITIES, 2000, 34 (1-2): : 103 - 108
  • [38] Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary
    Yoon, Y
    Seon, CN
    Lee, S
    Seo, J
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (03) : 710 - 722
  • [39] Using Verb Subcategorization for Word Sense Disambiguation
    Roberts, Will
    Kordoni, Valia
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 829 - 832
  • [40] Word sense disambiguation using implicit information
    Jain, Goonjan
    Lobiyal, D. K.
    [J]. NATURAL LANGUAGE ENGINEERING, 2020, 26 (04) : 413 - 432