An Improved K-means text clustering algorithm By Optimizing initial cluster centers

被引:0
|
作者
Xiong, Caiquan [1 ]
Hua, Zhen [1 ]
Lv, Ke [1 ]
Li, Xuan [1 ]
机构
[1] Hubei Univ Technol, Sch Comp Sci, Wuhan, Hubei, Peoples R China
来源
2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD) | 2016年
基金
中国国家自然科学基金;
关键词
K-means algorithm; initial cluster centers; Text clustering;
D O I
10.1109/CCBD.2016.29
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
K-means clustering algorithm is an influential algorithm in data mining. The traditional K-means algorithm has sensitivity to the initial cluster centers, leading to the result of clustering depends on the initial centers excessively. In order to overcome this shortcoming, this paper proposes an improved K-means text clustering algorithm by optimizing initial cluster centers. The algorithm first calculates the density of each data object in the data set, and then judge which data object is an isolated point. After removing all of isolated points, a set of data objects with high density is obtained. Afterwards, chooses k high density data objects as the initial cluster centers, where the distance between the data objects is the largest. The experimental results show that the improved K-means algorithm can improve the stability and accuracy of text clustering.
引用
收藏
页码:265 / 268
页数:4
相关论文
共 50 条
  • [41] A Density-Based Method for Selection of the Initial Clustering Centers of K-means Algorithm
    Du, Xin
    Xu, Ning
    Zhou, Cailan
    Xiao, Shihui
    2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 2509 - 2512
  • [42] On Careful Selection of Initial Centers for K-means Algorithm
    Jothi, R.
    Mohanty, Sraban Kumar
    Ojha, Aparajita
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 435 - 445
  • [43] Cluster center initialization algorithm for K-means clustering
    Khan, SS
    Ahmad, A
    PATTERN RECOGNITION LETTERS, 2004, 25 (11) : 1293 - 1302
  • [44] An Effective Method Determining the Initial Cluster Centers for K-means for Clustering Gene Expression Data
    Tanir, Deniz
    Nuriyeva, Fidan
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 751 - 754
  • [45] K-means Clustering Algorithm with Refined Initial Center
    Chen, Xuhui
    Xu, Yong
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 2203 - 2206
  • [46] k*-means -: A generalized k-means clustering algorithm with unknown cluster number
    Cheung, YM
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 307 - 317
  • [47] An Improved Method Based on the Density and K-means Nearest Neighbor Text Clustering Algorithm
    Fan, Xiaojing
    Jiang, Mingyang
    Pei, Zhili
    Qiao, Shicheng
    Lian, Jie
    Wang, Chaoyong
    2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR EDUCATION (ICTE 2015), 2015, : 312 - 315
  • [48] Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering
    Liu, Wenjun
    Sun, Yuyan
    Yu, Bao
    Wang, Hailan
    Peng, Qingcheng
    Hou, Mengshu
    Guo, Huan
    Wang, Hai
    Liu, Cheng
    KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [49] Optimizing K-Means Text Document Clustering Using Latent Semantic Indexing and Pillar Algorithm
    Adinugroho, Sigit
    Sari, Yuita Arum
    Fauzi, M. Ali
    Adikara, Putra Pandu
    2017 5TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2017, : 81 - 85
  • [50] A Clustering K-means Algorithm Based on Improved PSO Algorithm
    Tan, Long
    2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 940 - 944