A New Word Clustering Algorithm Based on Word Similarity

被引:0
|
作者
YUAN Lichi [1 ]
机构
[1] School of Information Technology, Jiangxi University of Finance and Economics
基金
中国国家自然科学基金;
关键词
Word similarity; Word clustering; Statistical language model;
D O I
暂无
中图分类号
TP391.1 [文字信息处理];
学科分类号
081203 ; 0835 ;
摘要
Category-based statistic language model is an important method to solve the problem of sparse data in statistical language models. But there are two bottlenecks about this model: 1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation; 2)Class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a novel definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given and a bottom-up hierarchical clustering algorithm was proposed.Experimental results show that the word clustering algorithm based on word similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 207.8.
引用
收藏
页码:1221 / 1226
页数:6
相关论文
共 50 条
  • [11] A CLUSTERING AND WORD SIMILARITY BASED APPROACH FOR IDENTIFYING PRODUCT FEATURE WORDS
    Suryadi, Dedy
    Kim, Harrison
    DS87-6: PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 17) VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2017, : 71 - 80
  • [12] Word clustering based on similarity and vari-gram language model
    Yuan, LC
    Zhong, YX
    ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1222 - 1226
  • [13] Clustering words for statistical language models based on contextual word similarity
    Farhat, A
    Isabelle, JF
    OShaughnessy, D
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 180 - 183
  • [14] An improved Chinese word semantic similarity algorithm based on CiLin
    Li, Fei
    Zhu, Xinhua
    Chen, Hongchao
    Ma, Runcong
    Deng, Han
    Journal of Information and Computational Science, 2015, 12 (10): : 3799 - 3807
  • [15] Word sense disambiguation based on word sense clustering
    Anaya-Sanchez, Henry
    Pons-Porrata, Aurora
    Berlanga-Llavori, Rafael
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA-SBIA 2006, PROCEEDINGS, 2006, 4140 : 472 - 481
  • [16] A novel word clustering algorithm based on latent semantic analysis
    Bellegarda, JR
    Butzberger, JW
    Chow, YL
    Coccaro, NB
    Naik, D
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 172 - 175
  • [17] Similarity Word-Sequence Kernels for Sentence Clustering
    Andres-Ferrer, Jesus
    Sanchis-Trilles, German
    Casacuberta, Francisco
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 610 - 619
  • [18] AUDIO WORD SIMILARITY FOR CLUSTERING WITH ZERO RESOURCES BASED ON ITERATIVE HMM CLASSIFICATION
    Royer, Amelie
    Gravier, Guillaume
    Claveau, Vincent
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5340 - 5344
  • [19] New Word Pair Level Embeddings to Improve Word Pair Similarity
    Shaukat, Asma
    Khan, Nazar
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 57 - 62
  • [20] Computation of Word Similarity Based on the Information Content of Sememes and PageRank Algorithm
    Li, Hao
    Mu, Lingling
    Zan, Hongying
    CHINESE LEXICAL SEMANTICS, CLSW 2016, 2016, 10085 : 416 - 425