Combining Semi-supervised Clustering and Classification Under a Generalized Framework

被引:0
作者
Jiang, Zhen [1 ,2 ]
Zhao, Lingyun [1 ]
Lu, Yu [1 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang, Peoples R China
[2] Jiangsu Prov Big Data Ubiquitous Percept & Intelli, Zhenjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Co-training; Classification; Semi-supervised clustering; Cluster-splitting;
D O I
10.1007/s00357-024-09489-9
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Most machine learning algorithms rely on having a sufficient amount of labeled data to train a reliable classifier. However, labeling data is often costly and time-consuming, while unlabeled data can be readily accessible. Therefore, learning from both labeled and unlabeled data has become a hot topic of interest. Inspired by the co-training algorithm, we present a learning framework called CSCC, which combines semi-supervised clustering and classification to learn from both labeled and unlabeled data. Unlike existing co-training style methods that construct diverse classifiers to learn from each other, CSCC leverages the diversity between semi-supervised clustering and classification models to achieve mutual enhancement. Existing classification algorithms can be easily adapted to CSCC, allowing them to generalize from a few labeled data. Especially, in order to bridge the gap between class information and clustering, we propose a semi-supervised hierarchical clustering algorithm that utilizes labeled data to guide the process of cluster-splitting. Within the CSCC framework, we introduce two loss functions to supervise the iterative updating of the semi-supervised clustering and classification models, respectively. Extensive experiments conducted on a variety of benchmark datasets validate the superiority of CSCC over other state-of-the-art methods.
引用
收藏
页码:181 / 204
页数:24
相关论文
共 50 条
  • [41] AN EFFECTIVE SEMI-SUPERVISED CLUSTERING FRAMEWORK INTEGRATING PAIRWISE CONSTRAINTS AND ATTRIBUTE PREFERENCES
    Wang, Jinlong
    Wu, Shunyao
    Wen, Can
    Li, Gang
    COMPUTING AND INFORMATICS, 2012, 31 (03) : 597 - 612
  • [42] A unified view of density-based methods for semi-supervised clustering and classification
    Gertrudes, Jadson Castro
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (06) : 1894 - 1952
  • [43] Deep semi-supervised classification based in deep clustering and cross-entropy
    de Lima, Bruno Vicente Alves
    Neto, Adriao Duarte Doria
    Silva, Lucia Emilia Soares
    Machado, Vinicius Ponte
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (08) : 3961 - 4000
  • [44] Leaf classification using multiple feature analysis based on semi-supervised clustering
    Li Longlong
    Garibaldi, Jonathan M.
    He Dongjian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2015, 29 (04) : 1465 - 1477
  • [45] Semi-supervised clustering for gene-expression data in multiobjective optimization framework
    Alok, Abhay Kumar
    Saha, Sriparna
    Ekbal, Asif
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2017, 8 (02) : 421 - 439
  • [46] A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints
    Ma, Huifang
    Zhao, Weizhong
    Shi, Zhongzhi
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 36 (03) : 629 - 651
  • [47] A unified view of density-based methods for semi-supervised clustering and classification
    Jadson Castro Gertrudes
    Arthur Zimek
    Jörg Sander
    Ricardo J. G. B. Campello
    Data Mining and Knowledge Discovery, 2019, 33 : 1894 - 1952
  • [48] Co-training with Clustering for the Semi-supervised Classification of Remote Sensing Images
    Aydav, Prem Shankar Singh
    Minz, Sonjharia
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 2, 2016, 380 : 659 - 667
  • [49] Semi-Supervised Clustering Algorithm for Rumor Minimization and Propagation with Classification in Social Networks
    Amutha, R.
    Kumar, D. Vimal
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 500 - 507
  • [50] A nonnegative matrix factorization framework for semi-supervised document clustering with dual constraints
    Huifang Ma
    Weizhong Zhao
    Zhongzhi Shi
    Knowledge and Information Systems, 2013, 36 : 629 - 651