Semi-supervised neighborhood discrimination index for feature selection

被引:31
作者
Pang, Qing-Qing [1 ,2 ]
Zhang, Li [1 ,2 ,3 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Jiangsu, Peoples R China
[2] Soochow Univ, Joint Int Res Lab Machine Learning & Neuromorph C, Suzhou 215006, Jiangsu, Peoples R China
[3] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou 215006, Jiangsu, Peoples R China
关键词
Semi-supervised; Feature selection; Neighborhood discriminant index; MUTUAL INFORMATION; REGRESSION;
D O I
10.1016/j.knosys.2020.106224
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neighborhood discriminant index (NDI) is an effective feature selection method for supervised learning. In reality, it is easy to obtain unlabeled data and is costly to tag them all. Thus, the given dataset commonly has only a small amount of tagged samples and a large amount of unlabeled ones, which cannot be handled by supervised learning methods. For this situation, we propose a semi-supervised feature selection method called semi-supervised neighborhood discriminant index (SSNDI) that combines NDI and the Laplacian score method to effectively deal with both labeled and unlabeled samples. The goal of SSNDI is to find an optimal feature subset that has a good ability to keep local geometrical structure and to distinguish samples belonging to different classes. In SSNDI, the classical Laplacian score method is modified to cooperate the iterative form of NDI. In each iteration, SSNDI picks up an important feature according to the new criterion that is a mixture of NDI and the modified Laplacian score. Extensive experiments are conducted on UCI and microarray gene datasets. Experimental results confirm that SSNDI can achieve a better performance than NDI and the other state-of-the-art semi-supervised methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 41 条
[1]   Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection [J].
Ang, Jun Chin ;
Mirzal, Andri ;
Haron, Habibollah ;
Hamed, Haza Nuzly Abdull .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :971-989
[2]  
[Anonymous], 2000, Pattern Classification
[3]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47
[4]  
Bishop C. M., 1995, ADV COMP, V12, P1235
[5]   Feature weight estimation for gene selection: a local hyperlinear learning approach [J].
Cai, Hongmin ;
Ruan, Peiying ;
Ng, Michael ;
Akutsu, Tatsuya .
BMC BIOINFORMATICS, 2014, 15
[6]   Predictive Ensemble Pruning by Expectation Propagation [J].
Chen, Huanhuan ;
Tino, Peter ;
Yao, Xin .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (07) :999-1013
[7]   Online selection of discriminative tracking features [J].
Collins, RT ;
Liu, YX ;
Leordeanu, M .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005, 27 (10) :1631-1643
[8]  
Dua D., 2017, UCI machine learning repository
[9]   MULTIPLE COMPARISONS AMONG MEANS [J].
DUNN, OJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1961, 56 (293) :52-&