Word Clustering Algorithms Based on Word Similarity

被引:2
|
作者
Yuan, Lichi [1 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330013, Peoples R China
来源
2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I | 2015年
关键词
Word similarity; Word clustering; Statistical language model;
D O I
10.1109/IHMSC.2015.36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.
引用
收藏
页码:21 / 24
页数:4
相关论文
共 50 条
  • [41] Graph and Centroid-based Word Clustering
    Thaiprayoon, Santipong
    Unger, Herwig
    Kubek, Mario
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168
  • [42] Information Retrieval Based on Word Semantic Clustering
    Chang, Chia-Yang
    Lin, Yan-Ting
    Lee, Shie-Jue
    Lai, Chih-Chin
    2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [43] WORD DISTRIBUTED REPRESENTATION BASED TEXT CLUSTERING
    Feng, Shan
    Liu, Ruifang
    Wang, Qinlong
    Shi, Ruisheng
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 389 - 393
  • [44] A MODEL FOR WORD CLUSTERING
    THOM, JA
    ZOBEL, J
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1992, 43 (09): : 616 - 627
  • [45] Clustering of Word Senses
    Agirre, Eneko
    GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 4 - 4
  • [46] A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework
    Flavio Massimiliano Cecchini
    Martin Riedl
    Elisabetta Fersini
    Chris Biemann
    Language Resources and Evaluation, 2018, 52 : 733 - 770
  • [47] A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework
    Cecchini, Flavio Massimiliano
    Riedl, Martin
    Fersini, Elisabetta
    Biemann, Chris
    LANGUAGE RESOURCES AND EVALUATION, 2018, 52 (03) : 733 - 770
  • [48] An Empirical Investigation of Performances of Different Word Embedding Algorithms in Comment Clustering
    Dorani, Eimal
    Duru, Nevcihan
    Yildiz, Tugba
    2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 377 - 380
  • [49] Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy
    Han, Lushan
    Finin, Tim
    McNamee, Paul
    Joshi, Anupam
    Yesha, Yelena
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (06) : 1307 - 1322
  • [50] WORD-FREQUENCY AND PHONEMIC SIMILARITY IN A WORD AVOIDANCE TASK
    SMITH, PT
    SMITH, H
    JONES, KF
    ACTA PSYCHOLOGICA, 1976, 40 (05) : 405 - 421