Text Document Clustering Based on Density K-means

被引:0
作者
Wu, Di [1 ]
Zeng, Yan [2 ]
Qu, Yin-chuan [2 ]
机构
[1] Natl Univ Def, Coll Comp, Changsha, Hunan, Peoples R China
[2] Beijing Gaodi Informat Technol Co Ltd, Beijing, Peoples R China
来源
INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016) | 2016年
基金
中国国家自然科学基金;
关键词
K-means; Density; Text document; Clustering; NUMBER;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
K-means is one of the most fundamental techniques in clustering. It has been applied in many fields, such as image processing and Natural Language Processing. It has good performance in many cases, especially in dealing with large data sets. However, how to choose the initial cluster centers is a hard problem, different choice may cause the clustering results by K-means unstable even get the local optimum. To solve this problem, many methods have be proposed, while these methods only apply in some certain fields and perform disappointed when we use for text documents clustering. In this paper, we designed a novel density K-means algorithm and apply it in the text document clustering. The experimental results show that it performs better than most of the existing methods in Chinese corpus. Furthermore, compared with other algorithms, our algorithm can effectively decrease the iterations.
引用
收藏
页数:8
相关论文
共 18 条
  • [1] Bernotas M., 2015, INFORM TECHNOLOGY CO, V36
  • [2] Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
    Cohen, Michael B.
    Elder, Sam
    Musco, Cameron
    Musco, Christopher
    Persu, Madalina
    [J]. STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, : 163 - 172
  • [3] FORGY EW, 1965, BIOMETRICS, V21, P768
  • [4] A non-parametric method to estimate the number of clusters
    Fujita, Andre
    Takahashi, Daniel Y.
    Patriota, Alexandre G.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 73 : 27 - 39
  • [5] A text similarity measurement combining word semantic information with TF-IDF method
    Huang C.-H.
    Yin J.
    Hou F.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2011, 34 (05): : 856 - 864
  • [6] Jiawei H., 2001, Data mining: concepts and techniques, V5
  • [7] Joshi AmeyaC., 2015, Enforcing document clustering for forensic analysis using weighted matrix method (wmm)
  • [8] Cluster center initialization algorithm for K-means clustering
    Khan, SS
    Ahmad, A
    [J]. PATTERN RECOGNITION LETTERS, 2004, 25 (11) : 1293 - 1302
  • [9] MacQueen, 1967, BERK S MATH STAT PRO, DOI DOI 10.1007/S11665-016-2173-6
  • [10] AN EXAMINATION OF PROCEDURES FOR DETERMINING THE NUMBER OF CLUSTERS IN A DATA SET
    MILLIGAN, GW
    COOPER, MC
    [J]. PSYCHOMETRIKA, 1985, 50 (02) : 159 - 179