A novel rough semi-supervised k-means algorithm for text clustering

被引：4

作者：

Tang, Lei-yu ^{[1
]}

Wang, Zhen-hao ^{[1
]}

Wang, Shu-dong ^{[2
]}

Fan, Jian-cong ^{[1
]}

Yue, Guo-wei ^{[3
]}

机构：

[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China

[2] China Univ Petr, Coll Comp Sci & Technol, Qingdao, Peoples R China

[3] Shandong Univ Sci & Technol, Key Lab Min Disaster Prevent & Control, Qingdao 266590, Peoples R China

来源：

INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION | 2023年 / 21卷 / 02期

基金：

中国国家自然科学基金;

关键词：

rough set; approximation set; k-means algorithm; semi-supervised clustering; high dimensional sparse data; MODEL;

D O I：

10.1504/IJBIC.2023.130548

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since many attribute values of high-dimensional sparse data are zero, we combine the approximation set of the rough set theory with the semi-supervised k-means algorithm to propose a rough set-based semi-supervised k-means (RSKmeans) algorithm. Firstly, the proportion of non-zero values is calculated by a few labelled data samples, and a small number of important attributes in each cluster are selected to calculate the clustering centres. Secondly, the approximation set is used to calculate the information gain of each attribute. Thirdly, different attribute values are partitioned into the corresponding approximate sets according to the comparison of information gain with the upper approximation and boundary threshold. Then, the new attributes are increased and the above process is continued to update the clustering centres. The experimental results on text data show that the RSKmeans algorithm can help find the important attributes, filter the invalid information, and improve the performances significantly.

引用

页码：57 / 68

页数：13

共 50 条

[1] A Semi-Supervised Text Clustering Approach Based on K-Means Algorithm
Zhan, Lizhang
Xu, Hong
Chen, Xiuguo
INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT (EBM2011), VOLS 1-6, 2011, : 2616 - 2620
[2] An Improved Semi-Supervised K-Means Clustering Algorithm
Ye Hanmin
Lv Hao
Sun Qianting
2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 41 - 44
[3] K-means clustering algorithm based on semi-supervised learning
Department of Mathematics and Computer, Shangrao Normal College, Shangrao 334001, China
不详
J. Comput. Inf. Syst., 2008, 5 (2007-2013):
[4] Semi-supervised Text Categorization Using Recursive K-means Clustering
Gowda, Harsha S.
Suhil, Mahamad
Guru, D. S.
Raju, Lavanya Narayana
RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 217 - 227
[5] A semi-supervised sparse K-Means algorithm
Vouros, Avgoustinos
Vasilaki, Eleni
PATTERN RECOGNITION LETTERS, 2021, 142 : 65 - 71
[6] Active Learning for Semi-Supervised K-Means Clustering
Vu, Viet-Vu
Labroche, Nicolas
Bouchon-Meunier, Bernadette
22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
[7] Multi-relational Data Semi-supervised K-Means Clustering Algorithm
Xia, Zhanguo
Zhang, Wentao
Cai, Shiyu
Xia, Shixiong
ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2011, 7002 : 413 - 420
[8] Semi-supervised k-means plus
Yoder, Jordan
Priebe, Carey E.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (13) : 2597 - 2608
[9] A generalized K-means algorithm with semi-supervised weight coefficients
Morii, Fujiki
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 198 - 201
[10] Semi-supervised K-Means Clustering by Optimizing Initial Cluster Centers
Wang, Xin
Wang, Chaofei
Shen, Junyi
WEB INFORMATION SYSTEMS AND MINING, PT II, 2011, 6988 : 178 - +

← 1 2 3 4 5 →