A novel rough semi-supervised k-means algorithm for text clustering

被引:4
|
作者
Tang, Lei-yu [1 ]
Wang, Zhen-hao [1 ]
Wang, Shu-dong [2 ]
Fan, Jian-cong [1 ]
Yue, Guo-wei [3 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[2] China Univ Petr, Coll Comp Sci & Technol, Qingdao, Peoples R China
[3] Shandong Univ Sci & Technol, Key Lab Min Disaster Prevent & Control, Qingdao 266590, Peoples R China
基金
中国国家自然科学基金;
关键词
rough set; approximation set; k-means algorithm; semi-supervised clustering; high dimensional sparse data; MODEL;
D O I
10.1504/IJBIC.2023.130548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since many attribute values of high-dimensional sparse data are zero, we combine the approximation set of the rough set theory with the semi-supervised k-means algorithm to propose a rough set-based semi-supervised k-means (RSKmeans) algorithm. Firstly, the proportion of non-zero values is calculated by a few labelled data samples, and a small number of important attributes in each cluster are selected to calculate the clustering centres. Secondly, the approximation set is used to calculate the information gain of each attribute. Thirdly, different attribute values are partitioned into the corresponding approximate sets according to the comparison of information gain with the upper approximation and boundary threshold. Then, the new attributes are increased and the above process is continued to update the clustering centres. The experimental results on text data show that the RSKmeans algorithm can help find the important attributes, filter the invalid information, and improve the performances significantly.
引用
收藏
页码:57 / 68
页数:13
相关论文
共 50 条
  • [1] A Semi-Supervised Text Clustering Approach Based on K-Means Algorithm
    Zhan, Lizhang
    Xu, Hong
    Chen, Xiuguo
    INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT (EBM2011), VOLS 1-6, 2011, : 2616 - 2620
  • [2] An Improved Semi-Supervised K-Means Clustering Algorithm
    Ye Hanmin
    Lv Hao
    Sun Qianting
    2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC), 2016, : 41 - 44
  • [3] K-means clustering algorithm based on semi-supervised learning
    Department of Mathematics and Computer, Shangrao Normal College, Shangrao 334001, China
    不详
    J. Comput. Inf. Syst., 2008, 5 (2007-2013):
  • [4] Semi-supervised Text Categorization Using Recursive K-means Clustering
    Gowda, Harsha S.
    Suhil, Mahamad
    Guru, D. S.
    Raju, Lavanya Narayana
    RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 217 - 227
  • [5] A semi-supervised sparse K-Means algorithm
    Vouros, Avgoustinos
    Vasilaki, Eleni
    PATTERN RECOGNITION LETTERS, 2021, 142 : 65 - 71
  • [6] Active Learning for Semi-Supervised K-Means Clustering
    Vu, Viet-Vu
    Labroche, Nicolas
    Bouchon-Meunier, Bernadette
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [7] Multi-relational Data Semi-supervised K-Means Clustering Algorithm
    Xia, Zhanguo
    Zhang, Wentao
    Cai, Shiyu
    Xia, Shixiong
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PT I, 2011, 7002 : 413 - 420
  • [8] Semi-supervised k-means plus
    Yoder, Jordan
    Priebe, Carey E.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (13) : 2597 - 2608
  • [9] A generalized K-means algorithm with semi-supervised weight coefficients
    Morii, Fujiki
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 198 - 201
  • [10] Semi-supervised K-Means Clustering by Optimizing Initial Cluster Centers
    Wang, Xin
    Wang, Chaofei
    Shen, Junyi
    WEB INFORMATION SYSTEMS AND MINING, PT II, 2011, 6988 : 178 - +