STDPboost: A Self-Training Method Based on Density Peaks and Improved Adaboost for Semi-Supervised Classification

被引:0
作者
Lin, Xu [1 ]
Li, Junnan [2 ]
机构
[1] Anhui Sanlian Univ, Sch Comp Engn, Hefei 230601, Anhui, Peoples R China
[2] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Classification algorithms; Prediction algorithms; Iterative methods; Forestry; Clustering algorithms; Predictive models; Semisupervised learning; Semi-supervised learning; semi-supervised classification; self-training methods; oversampling techniques; Adaboost; FRAMEWORK; NEIGHBOR;
D O I
10.1109/ACCESS.2023.3294982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The self-training methods have been praised by extensive research in semi-supervised classification. Mislabeling is the main challenge in self-training methods. Multiple variations of self-training methods are recently proposed against mislabeling from the following one of two aspects: a) using heuristic rules to find high-confidence unlabeled samples that can easily be predicted correctly in each iteration; b) enhancing prediction performance by employing ensemble classifiers composed of multiple weak classifiers. Yet, they still suffer from the following issues: a); most strategies for finding high-confidence unlabeled samples heavily rely on parameters; b) almost all employed ensemble classifiers originally designed for supervised classifiers and may not be suitable for semi-supervised classification due to the limited number and unrepresented distribution of the initial labeled data; c) few can overcome mislabeling from the above two aspects at the same time. To advance the state of the art, a new self-training method based on density peaks clustering and improved Adaboost is presented and named as STDPboost. In the iterative self-taught process, a new density peaks clustering-based strategy is proposed to find high-confidence unlabeled samples and a new ensemble classifier named AdaboostSEMI and more suitable for semi-supervised classification is proposed to predict high-confidence unlabeled samples, which overcomes mislabeling and the mentioned shortcomings of existing self-training methods. Intensive experiments on benchmark data sets have proven that STDPboost outperforms 7 state-of-the-art self-training methods in average classification accuracy of KNN classifier and CART classifier with the percentages of the initial labeled data from 10% to 50% due to further alleviating mislabeling.
引用
收藏
页码:72974 / 72989
页数:16
相关论文
共 44 条
  • [1] Help-Training for semi-supervised support vector machines
    Adankon, Mathias M.
    Cheriet, Mohamed
    [J]. PATTERN RECOGNITION, 2011, 44 (09) : 2220 - 2230
  • [2] Multi-label semi-supervised classification through optimum-path forest
    Amorim, Willian P.
    Falcao, Alexandre X.
    Papa, Joao P.
    [J]. INFORMATION SCIENCES, 2018, 465 : 86 - 104
  • [3] [Anonymous], 2023, VOLUME, V11
  • [4] SMOTEBoost: Improving prediction of the minority class in boosting
    Chawla, NV
    Lazarevic, A
    Hall, LO
    Bowyer, KW
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
  • [5] A Domain Adaptive Density Clustering Algorithm for Data With Varying Density Distribution
    Chen, Jianguo
    Yu, Philip S.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2310 - 2321
  • [6] Semi-supervised anatomical landmark detection via shape-regulated self-training
    Chen, Runnan
    Ma, Yuexin
    Liu, Lingjie
    Chen, Nenglun
    Cui, Zhiming
    Wei, Guodong
    Wang, Wenping
    [J]. NEUROCOMPUTING, 2022, 471 : 335 - 345
  • [7] Self-training method based on GCN for semi-supervised short text classification
    Cui, Hongyan
    Wang, Gangkun
    Li, Yuanxin
    Welsch, Roy E.
    [J]. INFORMATION SCIENCES, 2022, 611 : 18 - 29
  • [8] A hybrid interval type-2 semi-supervised possibilistic fuzzy c-means clustering and particle swarm optimization for satellite image analysis
    Dinh Sinh Mai
    Long Thanh Ngo
    Le Hung Trinh
    Hagras, Hani
    [J]. INFORMATION SCIENCES, 2021, 548 : 398 - 422
  • [9] ADAST: Attentive Cross-Domain EEG-Based Sleep Staging Framework With Iterative Self-Training
    Eldele, Emadeldeen
    Ragab, Mohamed
    Chen, Zhenghua
    Wu, Min
    Kwoh, Chee-Keong
    Li, Xiaoli
    Guan, Cuntai
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 210 - 221
  • [10] Using clustering analysis to improve semi-supervised classification
    Gan, Haitao
    Sang, Nong
    Huang, Rui
    Tong, Xiaojun
    Dan, Zhiping
    [J]. NEUROCOMPUTING, 2013, 101 : 290 - 298