A novel rough semi-supervised k-means algorithm for text clustering

被引:4
|
作者
Tang, Lei-yu [1 ]
Wang, Zhen-hao [1 ]
Wang, Shu-dong [2 ]
Fan, Jian-cong [1 ]
Yue, Guo-wei [3 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[2] China Univ Petr, Coll Comp Sci & Technol, Qingdao, Peoples R China
[3] Shandong Univ Sci & Technol, Key Lab Min Disaster Prevent & Control, Qingdao 266590, Peoples R China
基金
中国国家自然科学基金;
关键词
rough set; approximation set; k-means algorithm; semi-supervised clustering; high dimensional sparse data; MODEL;
D O I
10.1504/IJBIC.2023.130548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since many attribute values of high-dimensional sparse data are zero, we combine the approximation set of the rough set theory with the semi-supervised k-means algorithm to propose a rough set-based semi-supervised k-means (RSKmeans) algorithm. Firstly, the proportion of non-zero values is calculated by a few labelled data samples, and a small number of important attributes in each cluster are selected to calculate the clustering centres. Secondly, the approximation set is used to calculate the information gain of each attribute. Thirdly, different attribute values are partitioned into the corresponding approximate sets according to the comparison of information gain with the upper approximation and boundary threshold. Then, the new attributes are increased and the above process is continued to update the clustering centres. The experimental results on text data show that the RSKmeans algorithm can help find the important attributes, filter the invalid information, and improve the performances significantly.
引用
收藏
页码:57 / 68
页数:13
相关论文
共 50 条
  • [11] Semi-supervised learning techniques: k-means clustering in OODB Fragmentation
    Darabant, AS
    Campan, A
    ICCC 2004: SECOND IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL CYBERNETICS, PROCEEDINGS, 2004, : 333 - 338
  • [12] Analysis and Improvement of Semi-Supervised K-means Clustering Based on Particle Swarm Optimization Algorithm
    Sun Y.
    Xia Q.-Z.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2020, 43 (05): : 21 - 26
  • [13] Global Optimization for Semi-supervised K-means
    Sun, Xue
    Li, Kunlun
    Zhao, Rui
    Hu, Xikun
    2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 2, PROCEEDINGS, 2009, : 410 - +
  • [14] An Improved Semi-supervised K-means Algorithm Based on Information Gain
    Liu Zhenpeng
    Guo Ding
    Zhang Xizhong
    Wang Xu
    Zhu Xianchao
    2014 IEEE 17TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2014, : 1960 - 1963
  • [15] Full and Semi-supervised k-Means Clustering Optimised by Class Membership Hesitation
    Plonski, Piotr
    Zaremba, Krzysztof
    ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, ICANNGA 2013, 2013, 7824 : 218 - 225
  • [16] Two Semi-supervised Locality Sensitive K-Means Clustering Algorithms by Seeding
    Gu, Lei
    2012 IEEE FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATIONAL INTELLIGENCE (ICACI), 2012, : 296 - 299
  • [17] Semi-Supervised Soft K-means Clustering of Life Insurance Questionnaire Responses
    Biddle, Rhys
    Liu, Shaowu
    Xu, Guandong
    2018 5TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC, AND SOCIO-CULTURAL COMPUTING (BESC), 2018, : 30 - 31
  • [18] Semi-supervised k-means clustering for multi-type relational data
    Gao, Ying
    Qi, Hong
    Liu, Da-You
    Liu, He
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 326 - 330
  • [19] A Semi-supervised Clustering Algorithm Based on Rough Reduction
    Lin, Liandong
    Qu, Wei
    Yu, Xiang
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 5427 - +
  • [20] Semi-supervised Image Segmentation Based on K-means Algorithm and Random Walk
    Cai Xiumei
    Bian Jingwei
    Wang Yan
    Cui Qiaoqiao
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 2853 - 2856