Word Clustering Algorithms Based on Word Similarity

被引：2

作者：

Yuan, Lichi ^{[1
]}

机构：

[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330013, Peoples R China

来源：

2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I | 2015年

关键词：

Word similarity; Word clustering; Statistical language model;

D O I：

10.1109/IHMSC.2015.36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.

引用

页码：21 / 24

页数：4

共 50 条

[41] Graph and Centroid-based Word Clustering
Thaiprayoon, Santipong
Unger, Herwig
Kubek, Mario
2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 163 - 168
[42] Information Retrieval Based on Word Semantic Clustering
Chang, Chia-Yang
Lin, Yan-Ting
Lee, Shie-Jue
Lai, Chih-Chin
2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
[43] WORD DISTRIBUTED REPRESENTATION BASED TEXT CLUSTERING
Feng, Shan
Liu, Ruifang
Wang, Qinlong
Shi, Ruisheng
2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 389 - 393
[44] A MODEL FOR WORD CLUSTERING
THOM, JA
ZOBEL, J
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1992, 43 (09): : 616 - 627
[45] Clustering of Word Senses
Agirre, Eneko
GWC 2004: SECOND INTERNATIONAL WORDNET CONFERENCE, PROCEEDINGS, 2003, : 4 - 4
[46] A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework
Flavio Massimiliano Cecchini
Martin Riedl
Elisabetta Fersini
Chris Biemann
Language Resources and Evaluation, 2018, 52 : 733 - 770
[47] A comparison of graph-based word sense induction clustering algorithms in a pseudoword evaluation framework
Cecchini, Flavio Massimiliano
Riedl, Martin
Fersini, Elisabetta
Biemann, Chris
LANGUAGE RESOURCES AND EVALUATION, 2018, 52 (03) : 733 - 770
[48] An Empirical Investigation of Performances of Different Word Embedding Algorithms in Comment Clustering
Dorani, Eimal
Duru, Nevcihan
Yildiz, Tugba
2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, : 377 - 380
[49] Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy
Han, Lushan
Finin, Tim
McNamee, Paul
Joshi, Anupam
Yesha, Yelena
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (06) : 1307 - 1322
[50] WORD-FREQUENCY AND PHONEMIC SIMILARITY IN A WORD AVOIDANCE TASK
SMITH, PT
SMITH, H
JONES, KF
ACTA PSYCHOLOGICA, 1976, 40 (05) : 405 - 421

← 1 2 3 4 5 →