Combining Semi-supervised Clustering and Classification Under a Generalized Framework

被引:0
|
作者
Jiang, Zhen [1 ,2 ]
Zhao, Lingyun [1 ]
Lu, Yu [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Peoples R China
[2] Jiangsu Prov Big Data Ubiquitous Percept & Intelli, Zhenjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Co-training; Classification; Semi-supervised clustering; Cluster-splitting;
D O I
10.1007/s00357-024-09489-9
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Most machine learning algorithms rely on having a sufficient amount of labeled data to train a reliable classifier. However, labeling data is often costly and time-consuming, while unlabeled data can be readily accessible. Therefore, learning from both labeled and unlabeled data has become a hot topic of interest. Inspired by the co-training algorithm, we present a learning framework called CSCC, which combines semi-supervised clustering and classification to learn from both labeled and unlabeled data. Unlike existing co-training style methods that construct diverse classifiers to learn from each other, CSCC leverages the diversity between semi-supervised clustering and classification models to achieve mutual enhancement. Existing classification algorithms can be easily adapted to CSCC, allowing them to generalize from a few labeled data. Especially, in order to bridge the gap between class information and clustering, we propose a semi-supervised hierarchical clustering algorithm that utilizes labeled data to guide the process of cluster-splitting. Within the CSCC framework, we introduce two loss functions to supervise the iterative updating of the semi-supervised clustering and classification models, respectively. Extensive experiments conducted on a variety of benchmark datasets validate the superiority of CSCC over other state-of-the-art methods.
引用
收藏
页码:181 / 204
页数:24
相关论文
共 50 条
  • [1] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200
  • [2] Improving Semi-Supervised Classification using Clustering
    Arora, J.
    Tushir, M.
    Kashyap, R.
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2020, 7 (25) : 1 - 9
  • [3] Semi-supervised Classification Based on Clustering Ensembles
    Chen, Si
    Guo, Gongde
    Chen, Lifei
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PROCEEDINGS, 2009, 5855 : 629 - 638
  • [4] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [5] Semi-Supervised Clustering Under a "Compact-Cluster" Assumption
    Jiang, Zhen
    Zhan, Yongzhao
    Mao, Qirong
    Du, Yang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 5244 - 5256
  • [6] Semi-supervised Probabilistic Distance Clustering and the Uncertainty of Classification
    Iyigun, Cem
    Ben-Israel, Adi
    ADVANCES IN DATA ANALYSIS, DATA HANDLING AND BUSINESS INTELLIGENCE, 2010, : 3 - 20
  • [7] Use of Distributed Semi-Supervised Clustering for Text Classification
    Li, Pei
    Deng, Ze
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2019, 28 (08)
  • [8] Using clustering analysis to improve semi-supervised classification
    Gan, Haitao
    Sang, Nong
    Huang, Rui
    Tong, Xiaojun
    Dan, Zhiping
    NEUROCOMPUTING, 2013, 101 : 290 - 298
  • [9] A Framework for Semi-Supervised Clustering Based on Dimensionality Reduction
    Cui Peng
    Zhang Ru-bo
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 192 - +
  • [10] Research Progress on Semi-Supervised Clustering
    Qin, Yue
    Ding, Shifei
    Wang, Lijuan
    Wang, Yanru
    COGNITIVE COMPUTATION, 2019, 11 (05) : 599 - 612