A Supervised Feature Selection Algorithm through Minimum Spanning Tree Clustering

被引:12
作者
Liu, Qin [1 ]
Zhang, Jingxiao [1 ]
Xiao, Jiakai [1 ]
Zhu, Hongming [1 ]
Zhao, Qinpei [1 ]
机构
[1] Tongji Univ, Sch Software Engn, Shanghai 200092, Peoples R China
来源
2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI) | 2014年
关键词
Supervised Feature Selection; Feature Clustering; Minimum Spanning Tree; Variation of Information; INFORMATION; RELEVANCE;
D O I
10.1109/ICTAI.2014.47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In different types of feature selection algorithms, feature clustering is an emerging subset generation paradigm. In this paper, a Minimum spanning tree based Feature Clustering (MFC) algorithm is proposed. In the algorithm, an information-theoretic based measure, i.e., variation of information, is utilized as the feature redundancy and relevance metric. At the clustering phase, the sum of pairwise feature redundancy is minimized. Then, a representative feature is selected from each cluster, where the relevance between representative features and the target label is maximized. The algorithm is supervised since it is designed for various supervised learning problems, such as classification and regression. The proposed MFC is compared with three conventional feature selection algorithms, two of which are also feature clustering method. The MFC obtains well separated feature clusters in the experiment and considerable better classification accuracies applied on several real data sets.
引用
收藏
页码:264 / 271
页数:8
相关论文
共 20 条
[1]  
[Anonymous], 2011, Pei. data mining concepts and techniques
[2]   Attribute clustering for grouping, selection, and classification of gene expression data [J].
Au, WH ;
Chan, KCC ;
Wong, AKC ;
Wang, Y .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2005, 2 (02) :83-101
[3]  
Bache K., 2013, UCI Machine Learning Repository
[4]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[5]   Feature selection for clustering - A filter solution [J].
Dash, M ;
Choi, K ;
Scheuermann, P ;
Liu, H .
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, :115-122
[6]   Normalized Mutual Information Feature Selection [J].
Estevez, Pablo. A. ;
Tesmer, Michel ;
Perez, Claudio A. ;
Zurada, Jacek A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02) :189-201
[7]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088
[8]  
Grygorash O, 2006, PROC INT C TOOLS ART, P73
[9]  
LEWIS DD, 1992, SPEECH AND NATURAL LANGUAGE, P212
[10]  
Meila M., 2003, COMPUTATIONAL LEARNI