Unsupervised word-sense disambiguation using bilingual comparable corpora

被引:6
|
作者
Kaji, H [1 ]
Morimoto, Y [1 ]
机构
[1] Hitachi Ltd, Cent Res Lab, Kokubunji, Tokyo 1858601, Japan
关键词
word-sense disambiguation; unsupervised learning; comparable corpora;
D O I
10.1093/ietisy/E88-D.2.289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An unsupervised method for word-sense disambiguation using bilingual comparable corpora was developed. First, it extracts word associations, i.e., statistically significant pairs of associated words, from the corpus of each language. Then, it aligns word associations by consulting a bilingual dictionary and calculates correlation between senses of a target polysemous word and its associated words, which can be regarded as clues for identifying the sense of the target word. To overcome the problem of disparity of topical coverage between corpora of the two languages as well as the problem of ambiguity in word-association alignment, an algorithm for iteratively calculating a sense-vs.-clue correlation matrix for each target word was devised. Word-sense disambiguation for each instance of the target word is done by selecting the sense that maximizes the score, i.e., a weighted sum of the correlations between each sense and clues appearing in the context of the instance. An experiment using Wall Street Journal and Nihon Keizai Shimbun corpora together with the EDR bilingual dictionary showed that the new method has promising performance; namely, the F-measure of its sense selection was 74.6% compared to a baseline of 62.8%. The developed method will possibly be extended into a fully unsupervised method that features automatic division and definition of word senses.
引用
收藏
页码:289 / 301
页数:13
相关论文
共 50 条
  • [1] Word-Sense Disambiguation of Korean Predicates using Sejong Electronic Dictionary and Unsupervised learning
    Kang, Sangwook
    Oh, Yeontaek
    Kim, Minho
    Kwon, Hyuk-chul
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 257 - 261
  • [2] Unsupervised Word Sense Disambiguation Using The WWW
    Klapaftis, Ioannis P.
    Manandhar, Suresh
    STAIRS 2006, 2006, 142 : 174 - 183
  • [3] EXAMPLE-BASED WORD-SENSE DISAMBIGUATION
    URAMOTO, N
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1994, E77D (02) : 240 - 246
  • [4] Unsupervised Translated Word Sense Disambiguation in Constructing Bilingual Lexical Database
    Lynn, Htet Myet
    Choi, Chang
    Kim, Pankoo
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 1824 - 1827
  • [5] Machine Learning Techniques for Myanmar Word-Sense Disambiguation
    Khaing, Phyu Phyu
    Aung, Than Nwe
    GENETIC AND EVOLUTIONARY COMPUTING, VOL I, 2016, 387 : 175 - 185
  • [6] Unsupervised word sense disambiguation and rules extraction using non-aligned bilingual corpus
    Oliveira, F
    Wong, F
    Li, YP
    Zheng, J
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 30 - 35
  • [7] Unsupervised Korean Word Sense Disambiguation using CoreNet
    Han, Kijong
    Nam, Sangha
    Kim, Jiseong
    Hahm, Younggyun
    Choi, Key-Sun
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1023 - 1026
  • [8] Analyzing the impact of UMLS relations on word-sense disambiguation accuracy
    El-Rab, Wessam Gad
    Zaiane, Osmar R.
    El-Hajj, Mohammad
    4TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN-2013) AND THE 3RD INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH), 2013, 21 : 295 - 301
  • [9] Practical Word-Sense Disambiguation Using Co-occurring Concept Codes
    Chung, Youjin
    Lee, Jong-Hyeok
    MACHINE TRANSLATION, 2005, 19 (01) : 59 - 82
  • [10] Word-Sense Disambiguation for Ontology Mapping: Concept Disambiguation using Virtual Documents and Information Retrieval Techniques
    Schadd, Frederik C.
    Roos, Nico
    JOURNAL ON DATA SEMANTICS, 2015, 4 (03) : 167 - 186