STDPboost: A Self-Training Method Based on Density Peaks and Improved Adaboost for Semi-Supervised Classification

被引：0

作者：

Lin, Xu ^{[1
]}

Li, Junnan ^{[2
]}

机构：

[1] Anhui Sanlian Univ, Sch Comp Engn, Hefei 230601, Anhui, Peoples R China

[2] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

中国国家自然科学基金;

关键词：

Training; Classification algorithms; Prediction algorithms; Iterative methods; Forestry; Clustering algorithms; Predictive models; Semisupervised learning; Semi-supervised learning; semi-supervised classification; self-training methods; oversampling techniques; Adaboost; FRAMEWORK; NEIGHBOR;

D O I：

10.1109/ACCESS.2023.3294982

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The self-training methods have been praised by extensive research in semi-supervised classification. Mislabeling is the main challenge in self-training methods. Multiple variations of self-training methods are recently proposed against mislabeling from the following one of two aspects: a) using heuristic rules to find high-confidence unlabeled samples that can easily be predicted correctly in each iteration; b) enhancing prediction performance by employing ensemble classifiers composed of multiple weak classifiers. Yet, they still suffer from the following issues: a); most strategies for finding high-confidence unlabeled samples heavily rely on parameters; b) almost all employed ensemble classifiers originally designed for supervised classifiers and may not be suitable for semi-supervised classification due to the limited number and unrepresented distribution of the initial labeled data; c) few can overcome mislabeling from the above two aspects at the same time. To advance the state of the art, a new self-training method based on density peaks clustering and improved Adaboost is presented and named as STDPboost. In the iterative self-taught process, a new density peaks clustering-based strategy is proposed to find high-confidence unlabeled samples and a new ensemble classifier named AdaboostSEMI and more suitable for semi-supervised classification is proposed to predict high-confidence unlabeled samples, which overcomes mislabeling and the mentioned shortcomings of existing self-training methods. Intensive experiments on benchmark data sets have proven that STDPboost outperforms 7 state-of-the-art self-training methods in average classification accuracy of KNN classifier and CART classifier with the percentages of the initial labeled data from 10% to 50% due to further alleviating mislabeling.

引用

页码：72974 / 72989

页数：16

共 44 条

[1] Help-Training for semi-supervised support vector machines
Adankon, Mathias M.
Cheriet, Mohamed
[J]. PATTERN RECOGNITION, 2011, 44 (09) : 2220 - 2230
[2] Multi-label semi-supervised classification through optimum-path forest
Amorim, Willian P.
Falcao, Alexandre X.
Papa, Joao P.
[J]. INFORMATION SCIENCES, 2018, 465 : 86 - 104
[3] [Anonymous], 2023, VOLUME, V11
[4] SMOTEBoost: Improving prediction of the minority class in boosting
Chawla, NV
Lazarevic, A
Hall, LO
Bowyer, KW
[J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
[5] A Domain Adaptive Density Clustering Algorithm for Data With Varying Density Distribution
Chen, Jianguo
Yu, Philip S.
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2310 - 2321
[6] Semi-supervised anatomical landmark detection via shape-regulated self-training
Chen, Runnan
Ma, Yuexin
Liu, Lingjie
Chen, Nenglun
Cui, Zhiming
Wei, Guodong
Wang, Wenping
[J]. NEUROCOMPUTING, 2022, 471 : 335 - 345
[7] Self-training method based on GCN for semi-supervised short text classification
Cui, Hongyan
Wang, Gangkun
Li, Yuanxin
Welsch, Roy E.
[J]. INFORMATION SCIENCES, 2022, 611 : 18 - 29
[8] A hybrid interval type-2 semi-supervised possibilistic fuzzy c-means clustering and particle swarm optimization for satellite image analysis
Dinh Sinh Mai
Long Thanh Ngo
Le Hung Trinh
Hagras, Hani
[J]. INFORMATION SCIENCES, 2021, 548 : 398 - 422
[9] ADAST: Attentive Cross-Domain EEG-Based Sleep Staging Framework With Iterative Self-Training
Eldele, Emadeldeen
Ragab, Mohamed
Chen, Zhenghua
Wu, Min
Kwoh, Chee-Keong
Li, Xiaoli
Guan, Cuntai
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (01): : 210 - 221
[10] Using clustering analysis to improve semi-supervised classification
Gan, Haitao
Sang, Nong
Huang, Rui
Tong, Xiaojun
Dan, Zhiping
[J]. NEUROCOMPUTING, 2013, 101 : 290 - 298

← 1 2 3 4 5 →