A graph-based approach to feature selection

被引:32
作者
Zhang Z. [1 ]
Hancock E.R. [1 ]
机构
[1] Department of Computer Science, University of York
来源
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2011年 / 6658 LNCS卷
关键词
Graphic methods - Combinatorial optimization - Clustering algorithms;
D O I
10.1007/978-3-642-20844-7_21
中图分类号
学科分类号
摘要
In many data analysis tasks, one is often confronted with very high dimensional data. The feature selection problem is essentially a combinatorial optimization problem which is computationally expensive. To overcome this problem it is frequently assumed either that features independently influence the class variable or do so only involving pairwise feature interaction. To tackle this problem, we propose an algorithm consisting of three phases, namely, i) it first constructs a graph in which each node corresponds to each feature, and each edge has a weight corresponding to mutual information (MI) between features connected by that edge, ii) then perform dominant set clustering to select a highly coherent set of features, iii) further selects features based on a new measure called multidimensional interaction information (MII). The advantage of MII is that it can consider third or higher order feature interaction. By the help of dominant set clustering, which separates features into clusters in advance, thereby allows us to limit the search space for higher order interactions. Experimental results demonstrate the effectiveness of our feature selection method on a number of standard data-sets. © 2011 Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 214
页数:9
相关论文
共 10 条
[1]  
Battiti R., Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural. Networks, 5, 4, pp. 537-550, (2002)
[2]  
Cheng H., Qin Z., Qian W., Liu W., Conditional mutual information based feature selection, IEEE International Symposium on Knowledge Acquisition and Modeling, pp. 103-107, (2008)
[3]  
Devijver P., Kittler J., Pattern Recognition: A Statistical Approach, 761, (1982)
[4]  
Guo B., Nixon M., Gait feature subset selection by mutual information, IEEE Transactions on Systems, Man. and Cybernetics, Part A: Systems and Humans, 39, 1, pp. 36-46, (2008)
[5]  
Kwak N., Choi C., Input feature selection by mutual information based on parzen window, IEEE TPAMI, 24, 12, pp. 1667-1671, (2002)
[6]  
Pavan M., Pelillo M., A new graph-theoretic approach to clustering and segmentation, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1, (2003)
[7]  
Peng H., Long F., Ding C., Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 8, pp. 1226-1238, (2005)
[8]  
Shannon C., A mathematical theory of communication, ACM SIGMOBILE Mobile Computing and Communications Review, 5, 1, pp. 3-55, (2001)
[9]  
Yang H., Moody J., Feature selection based on joint mutual information, Proceedings of International ICSC Symposium on Advances in Intelligent Data Analysis, pp. 22-25, (1999)
[10]  
Zhang F., Zhao Y., Fen J., Unsupervised feature selection based on feature relevance, International Conference on Machine Learning and Cybernetics, 1, pp. 487-492, (2009)