A novel rough semi-supervised k-means algorithm for text clustering

被引:4
|
作者
Tang, Lei-yu [1 ]
Wang, Zhen-hao [1 ]
Wang, Shu-dong [2 ]
Fan, Jian-cong [1 ]
Yue, Guo-wei [3 ]
机构
[1] Shandong Univ Sci & Technol, Coll Comp Sci & Engn, Qingdao, Peoples R China
[2] China Univ Petr, Coll Comp Sci & Technol, Qingdao, Peoples R China
[3] Shandong Univ Sci & Technol, Key Lab Min Disaster Prevent & Control, Qingdao 266590, Peoples R China
基金
中国国家自然科学基金;
关键词
rough set; approximation set; k-means algorithm; semi-supervised clustering; high dimensional sparse data; MODEL;
D O I
10.1504/IJBIC.2023.130548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Since many attribute values of high-dimensional sparse data are zero, we combine the approximation set of the rough set theory with the semi-supervised k-means algorithm to propose a rough set-based semi-supervised k-means (RSKmeans) algorithm. Firstly, the proportion of non-zero values is calculated by a few labelled data samples, and a small number of important attributes in each cluster are selected to calculate the clustering centres. Secondly, the approximation set is used to calculate the information gain of each attribute. Thirdly, different attribute values are partitioned into the corresponding approximate sets according to the comparison of information gain with the upper approximation and boundary threshold. Then, the new attributes are increased and the above process is continued to update the clustering centres. The experimental results on text data show that the RSKmeans algorithm can help find the important attributes, filter the invalid information, and improve the performances significantly.
引用
收藏
页码:57 / 68
页数:13
相关论文
共 50 条
  • [21] Semi-Supervised Semantic Dynamic Text Clustering Algorithm
    Qian Z.-S.
    Huang R.-Z.
    Wei Q.
    Qin Y.-B.
    Chen Y.-P.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2019, 48 (06): : 803 - 808
  • [22] Semi-supervised word sense disambiguation by combining k-means clustering and the LSTM network
    Zhang C.
    Zhou X.
    Gao X.
    Liu H.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2021, 48 (06): : 161 - 171
  • [23] SSKM_Succ: A Novel Succinylation Sites Prediction Method Incorporating K-Means Clustering With a New Semi-Supervised Learning Algorithm
    Ning, Qiao
    Ma, Zhiqiang
    Zhao, Xiaowei
    Yin, Minghao
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (01) : 643 - 652
  • [24] A new semi-supervised fuzzy K-means clustering method with dynamic adjustment and label discrimination
    Hengdong Zhu
    Wenxiu Xie
    Yuanyuan Mu
    Juan Xu
    Fu Lee Wang
    Yingying Qu
    Tianyong Hao
    Neural Computing and Applications, 2024, 36 : 4709 - 4725
  • [25] A new semi-supervised fuzzy K-means clustering method with dynamic adjustment and label discrimination
    Zhu, Hengdong
    Xie, Wenxiu
    Mu, Yuanyuan
    Xu, Juan
    Wang, Fu Lee
    Qu, Yingying
    Hao, Tianyong
    NEURAL COMPUTING & APPLICATIONS, 2023, 36 (09): : 4709 - 4725
  • [26] Plant Leaf Recognition Using Texture Features and Semi-Supervised Spherical K-means Clustering
    Alamoudi, Shadi
    Hong, Xia
    Wei, Hong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [27] Retraction Note: Research on semi supervised K-means clustering algorithm in data mining
    Xiaodong Mai
    Jiangke Cheng
    Shengnan Wang
    Cluster Computing, 2023, 26 : 163 - 163
  • [28] RETRACTED ARTICLE: Research on semi supervised K-means clustering algorithm in data mining
    Xiaodong Mai
    Jiangke Cheng
    Shengnan Wang
    Cluster Computing, 2019, 22 : 3513 - 3520
  • [29] A Hardware/Software Co-Design Method for Approximate Semi-supervised K-Means Clustering
    Huang, Pengfei
    Wang, Chenghua
    Ma, Ruizhe
    Liu, Weiqiang
    Lombardi, Fabrizio
    2018 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI), 2018, : 575 - 580
  • [30] A Semi-Supervised Text Clustering Algorithm with Word Distribution Weights
    Zhou, Ping
    Wei, Jiayin
    Qin, Yongbin
    PROCEEDINGS OF THE 2013 THE INTERNATIONAL CONFERENCE ON EDUCATION TECHNOLOGY AND INFORMATION SYSTEM (ICETIS 2013), 2013, 65 : 1024 - 1028