An efficient semi-supervised representatives feature selection algorithm based on information theory

被引:51
作者
Wang, Yintong [1 ,2 ]
Wang, Jiandong [2 ]
Liao, Hao [3 ]
Chen, Haiyan [2 ]
机构
[1] Nanjing XiaoZhuang Univ, Key Lab Trusted Cloud Comp & Big Data Anal, Nanjing 211171, Jiangsu, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Jiangsu, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Guangdong Prov Key Lab Popular High Performance C, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Markov blanket; Information theory; Semi-supervised learning; Representative features; UNSUPERVISED FEATURE-SELECTION; DIMENSIONALITY REDUCTION; RELEVANCE; CAUSAL;
D O I
10.1016/j.patcog.2016.08.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection (FS) plays an important role in data mining and recognition, especially regarding large scale text, images and biological data. The Markov blanket provides a complete and sound solution to the selection of optimal features in supervised feature selection, and investigates thoroughly the relevance of features relating to class and the conditional independence relationship between features. However, incomplete label information makes it particularly difficult to acquire the optimal feature subset. In this paper, we propose a novel algorithm called the Semi-supervised Representatives Feature Selection algorithm based on information theory (SRFS), which is independent of any algorithm used for classification learning, and can rapidly and effectively identify and remove non-essential information and irrelevant and redundant features. More importantly, the unlabeled data are utilized in the Markov blanket as the labeled data through the relevance gain. Our results on several benchmark datasets demonstrate that SRFS can significantly improve upon state of the art supervised and semi-supervised algorithms. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:511 / 523
页数:13
相关论文
共 69 条
[1]  
Aliferis C.F., 2003, P AMIA ANN S
[2]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[3]   Mutual information-based feature selection for intrusion detection systems [J].
Amiri, Fatemeh ;
Yousefi, MohammadMahdi Rezaei ;
Lucas, Caro ;
Shakery, Azadeh ;
Yazdani, Nasser .
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2011, 34 (04) :1184-1199
[4]  
[Anonymous], 1997, NUMERICAL RECIPES C
[5]  
[Anonymous], 1963, DISTRIBUTION EREE MU
[6]  
[Anonymous], IJCAI 2001 WORKSHOP
[7]  
[Anonymous], 1994, MACHINE LEARNING P 1, DOI DOI 10.1016/B978-1-55860-335-6.50023-4
[8]  
Asuncion A., 2007, Uci machine learning repository
[9]   Integration of dense subgraph finding with feature clustering for unsupervised feature selection [J].
Bandyopadhyay, Sanghamitra ;
Bhadra, Tapas ;
Mitra, Pabitra ;
Maulik, Ujjwal .
PATTERN RECOGNITION LETTERS, 2014, 40 :104-112
[10]   A hybrid feature selection method based on instance learning and cooperative subset search [J].
Ben Brahim, Afef ;
Limam, Mohamed .
PATTERN RECOGNITION LETTERS, 2016, 69 :28-34