A New Word Clustering Algorithm Based on Word Similarity

被引:0
|
作者
YUAN Lichi [1 ]
机构
[1] School of Information Technology, Jiangxi University of Finance and Economics
基金
中国国家自然科学基金;
关键词
Word similarity; Word clustering; Statistical language model;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
Category-based statistic language model is an important method to solve the problem of sparse data in statistical language models. But there are two bottlenecks about this model: 1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation; 2)Class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a novel definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given and a bottom-up hierarchical clustering algorithm was proposed.Experimental results show that the word clustering algorithm based on word similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 207.8.
引用
收藏
页码:1221 / 1226
页数:6
相关论文
共 50 条
  • [1] A New Word Clustering Algorithm Based on Word Similarity
    Yuan Lichi
    CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (06) : 1221 - 1226
  • [2] Word Clustering Algorithms Based on Word Similarity
    Yuan, Lichi
    2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I, 2015, : 21 - 24
  • [3] Word Clustering based on Word2vec and Semantic Similarity
    Luo Jie
    Wang Qinglin
    Li Yuan
    2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
  • [4] Word Similarity Algorithm Based on WordNet And HowNet
    Ren, Wuling
    Guo, Jinju
    MECHANICAL ENGINEERING AND GREEN MANUFACTURING II, PTS 1 AND 2, 2012, 155-156 : 375 - 380
  • [5] An Improved Algorithm of Word Semantic Similarity Based on HowNet
    Kang, Bocheng
    Qi, Junpeng
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 266 - 271
  • [6] Word similarity algorithm based on multi-features
    Guo, Chang-Jin
    Pan, Feng
    Zuo, Yi
    JOURNAL OF INTERDISCIPLINARY MATHEMATICS, 2018, 21 (05) : 1067 - 1072
  • [7] A genetic word clustering algorithm
    Hernandez, G
    Bobadilla, L
    Sanchez, O
    2005 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-3, PROCEEDINGS, 2005, : 1075 - 1080
  • [8] A visual word clustering algorithm based on affinity propagation
    Zhao, Jian
    Sun, Cheng
    Ma, Miao
    Xie, Yu
    2012 7TH INTERNATIONAL CONFERENCE ON SYSTEM OF SYSTEMS ENGINEERING (SOSE), 2012, : 14 - 17
  • [9] A word-based soft clustering algorithm for documents
    Lin, KI
    Kondadadi, R
    COMPUTERS AND THEIR APPLICATIONS, 2001, : 391 - 394
  • [10] A Fast Algorithm of Computing Word Similarity
    Chen, Xingyuan
    Yang, Xia
    Su, Bingjun
    2013 9TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2013, : 405 - 408