An Improved K-means text clustering algorithm By Optimizing initial cluster centers

被引：0

作者：

Xiong, Caiquan ^{[1
]}

Hua, Zhen ^{[1
]}

Lv, Ke ^{[1
]}

Li, Xuan ^{[1
]}

机构：

[1] Hubei Univ Technol, Sch Comp Sci, Wuhan, Hubei, Peoples R China

来源：

2016 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA (CCBD) | 2016年

基金：

中国国家自然科学基金;

关键词：

K-means algorithm; initial cluster centers; Text clustering;

D O I：

10.1109/CCBD.2016.29

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

K-means clustering algorithm is an influential algorithm in data mining. The traditional K-means algorithm has sensitivity to the initial cluster centers, leading to the result of clustering depends on the initial centers excessively. In order to overcome this shortcoming, this paper proposes an improved K-means text clustering algorithm by optimizing initial cluster centers. The algorithm first calculates the density of each data object in the data set, and then judge which data object is an isolated point. After removing all of isolated points, a set of data objects with high density is obtained. Afterwards, chooses k high density data objects as the initial cluster centers, where the distance between the data objects is the largest. The experimental results show that the improved K-means algorithm can improve the stability and accuracy of text clustering.

引用

页码：265 / 268

页数：4

共 50 条

[41] A Density-Based Method for Selection of the Initial Clustering Centers of K-means Algorithm
Du, Xin
Xu, Ning
Zhou, Cailan
Xiao, Shihui
2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 2509 - 2512
[42] On Careful Selection of Initial Centers for K-means Algorithm
Jothi, R.
Mohanty, Sraban Kumar
Ojha, Aparajita
PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 435 - 445
[43] Cluster center initialization algorithm for K-means clustering
Khan, SS
Ahmad, A
PATTERN RECOGNITION LETTERS, 2004, 25 (11) : 1293 - 1302
[44] An Effective Method Determining the Initial Cluster Centers for K-means for Clustering Gene Expression Data
Tanir, Deniz
Nuriyeva, Fidan
2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, : 751 - 754
[45] K-means Clustering Algorithm with Refined Initial Center
Chen, Xuhui
Xu, Yong
PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOLS 1-4, 2009, : 2203 - 2206
[46] k*-means -: A generalized k-means clustering algorithm with unknown cluster number
Cheung, YM
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2002, 2002, 2412 : 307 - 317
[47] An Improved Method Based on the Density and K-means Nearest Neighbor Text Clustering Algorithm
Fan, Xiaojing
Jiang, Mingyang
Pei, Zhili
Qiao, Shicheng
Lian, Jie
Wang, Chaoyong
2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY FOR EDUCATION (ICTE 2015), 2015, : 312 - 315
[48] Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering
Liu, Wenjun
Sun, Yuyan
Yu, Bao
Wang, Hailan
Peng, Qingcheng
Hou, Mengshu
Guo, Huan
Wang, Hai
Liu, Cheng
KNOWLEDGE-BASED SYSTEMS, 2024, 287
[49] Optimizing K-Means Text Document Clustering Using Latent Semantic Indexing and Pillar Algorithm
Adinugroho, Sigit
Sari, Yuita Arum
Fauzi, M. Ali
Adikara, Putra Pandu
2017 5TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL AND BUSINESS INTELLIGENCE (ISCBI), 2017, : 81 - 85
[50] A Clustering K-means Algorithm Based on Improved PSO Algorithm
Tan, Long
2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 940 - 944

← 1 2 3 4 5 →