A semi-supervised self-training method based on density peaks and natural neighbors

被引:18
作者
Zhao, Suwen [1 ,2 ]
Li, Junnan [3 ]
机构
[1] Chongqing Univ, Coll Bioengn, Chongqing 400044, Peoples R China
[2] Guilin Univ Aerosp Technol, Dept Elect Engn, Guilin 541004, Peoples R China
[3] Chongqing Univ, Coll Comp Sci, Chongqing Key Lab Software Theory & Technol, Chongqing 400044, Peoples R China
关键词
Self-training method; Semi-supervised classification; Semi-supervised learning; Natural neighbors; Density peaks; CLASSIFICATION; SEARCH;
D O I
10.1007/s12652-020-02451-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The semi-supervised self-training method is one of the successful methodologies of semi-supervised classification and can train a classifier by exploiting both labeled data and unlabeled data. However, most of the self-training methods are limited by the distribution of initial labeled data, heavily rely on parameters and have the poor ability of prediction in the self-training process. To solve these problems, a novel self-training method based on density peaks and natural neighbors (STDPNaN) is proposed. In STDPNaN, an improved parameter-free density peaks clustering (DPCNaN) is firstly presented by introducing natural neighbors. The DPCNaN can reveal the real structure and distribution of data without any parameter, and then helps STDPNaN restore the real data space with the spherical or non-spherical distribution. Also, an ensemble classifier is employed to improve the predictive ability of STDPNaN in the self-training process. Intensive experiments show that (a) STDPNaN outperforms state-of-the-art methods in improving classification accuracy ofknearest neighbor, support vector machine and classification and regression tree; (b) STDPNaN also outperforms comparison methods without any restriction on the number of labeled data; (c) the running time of STDPNaN is acceptable.
引用
收藏
页码:2939 / 2953
页数:15
相关论文
共 55 条
[1]   Help-Training for semi-supervised support vector machines [J].
Adankon, Mathias M. ;
Cheriet, Mohamed .
PATTERN RECOGNITION, 2011, 44 (09) :2220-2230
[2]   Improving semi-supervised learning through optimum connectivity [J].
Amorim, Willian P. ;
Falcao, Alexandre X. ;
Papa, Joao P. ;
Carvalho, Marcelo H. .
PATTERN RECOGNITION, 2016, 60 :72-85
[3]   MULTIDIMENSIONAL BINARY SEARCH TREES USED FOR ASSOCIATIVE SEARCHING [J].
BENTLEY, JL .
COMMUNICATIONS OF THE ACM, 1975, 18 (09) :509-517
[4]   Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees [J].
Binh Thai Pham ;
Prakash, Indra ;
Dieu Tien Bui .
GEOMORPHOLOGY, 2018, 303 :256-270
[5]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[6]   Weighted samples based semi-supervised classification [J].
Chen, Xia ;
Yu, Guoxian ;
Tan, Qiaoyu ;
Wang, Jun .
APPLIED SOFT COMPUTING, 2019, 79 :46-58
[7]   Natural neighbor-based clustering algorithm with local representatives [J].
Cheng, Dongdong ;
Zhu, Qingsheng ;
Huang, Jinlong ;
Yang, Lijun ;
Wu, Quanwang .
KNOWLEDGE-BASED SYSTEMS, 2017, 123 :238-253
[8]   Effective semi-supervised learning strategies for automatic sentence segmentation [J].
Dalva, Dogan ;
Guz, Umit ;
Gurkan, Hakan .
PATTERN RECOGNITION LETTERS, 2018, 105 :76-86
[9]   Self-training on refined clause patterns for relation extraction [J].
Duc-Thuan Vo ;
Bagheri, Ebrahim .
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (04) :686-706
[10]  
Gan HT, 2009, DCABES 2009: THE 8TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE, PROCEEDINGS, P27