A New Word Clustering Algorithm Based on Word Similarity

被引：0

作者：

YUAN Lichi ^{[1
]}

机构：

[1] School of Information Technology, Jiangxi University of Finance and Economics

来源：

ChineseJournalofElectronics | 2017年 / 26卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Word similarity; Word clustering; Statistical language model;

D O I：

暂无

中图分类号：

TP391.1 [文字信息处理];

学科分类号：

081203 ; 0835 ;

摘要：

Category-based statistic language model is an important method to solve the problem of sparse data in statistical language models. But there are two bottlenecks about this model: 1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation; 2)Class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a novel definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given and a bottom-up hierarchical clustering algorithm was proposed.Experimental results show that the word clustering algorithm based on word similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 207.8.

引用

页码：1221 / 1226

页数：6

共 50 条

[11] A CLUSTERING AND WORD SIMILARITY BASED APPROACH FOR IDENTIFYING PRODUCT FEATURE WORDS
Suryadi, Dedy
Kim, Harrison
DS87-6: PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 17) VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2017, : 71 - 80
[12] Word clustering based on similarity and vari-gram language model
Yuan, LC
Zhong, YX
ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1222 - 1226
[13] Clustering words for statistical language models based on contextual word similarity
Farhat, A
Isabelle, JF
OShaughnessy, D
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 180 - 183
[14] An improved Chinese word semantic similarity algorithm based on CiLin
Li, Fei
Zhu, Xinhua
Chen, Hongchao
Ma, Runcong
Deng, Han
Journal of Information and Computational Science, 2015, 12 (10): : 3799 - 3807
[15] Word sense disambiguation based on word sense clustering
Anaya-Sanchez, Henry
Pons-Porrata, Aurora
Berlanga-Llavori, Rafael
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA-SBIA 2006, PROCEEDINGS, 2006, 4140 : 472 - 481
[16] A novel word clustering algorithm based on latent semantic analysis
Bellegarda, JR
Butzberger, JW
Chow, YL
Coccaro, NB
Naik, D
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 172 - 175
[17] Similarity Word-Sequence Kernels for Sentence Clustering
Andres-Ferrer, Jesus
Sanchis-Trilles, German
Casacuberta, Francisco
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 610 - 619
[18] AUDIO WORD SIMILARITY FOR CLUSTERING WITH ZERO RESOURCES BASED ON ITERATIVE HMM CLASSIFICATION
Royer, Amelie
Gravier, Guillaume
Claveau, Vincent
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5340 - 5344
[19] New Word Pair Level Embeddings to Improve Word Pair Similarity
Shaukat, Asma
Khan, Nazar
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 57 - 62
[20] Computation of Word Similarity Based on the Information Content of Sememes and PageRank Algorithm
Li, Hao
Mu, Lingling
Zan, Hongying
CHINESE LEXICAL SEMANTICS, CLSW 2016, 2016, 10085 : 416 - 425

← 1 2 3 4 5 →