A variance reduction framework for stable feature selection

被引:26
作者
Han, Yue [1 ]
Yu, Lei [1 ]
机构
[1] Department of Computer Science, Binghamton University, State University of New York, Binghamton, NY 13902-6000, United States
来源
Statistical Analysis and Data Mining | 2012年 / 5卷 / 05期
关键词
Clustering algorithms - Data reduction - Stability;
D O I
10.1002/sam.11152
中图分类号
学科分类号
摘要
Stability of feature selection is an important but under-addressed issue in knowledge discovery from high-dimensional data. In this study, we present a theoretical framework about the relationship between the stability and the accuracy of feature selection based on a formal bias-variance decomposition of feature selection error. The framework also reveals the connection between stability and sample size and suggests a variance reduction approach for improving the stability of feature selection algorithms under small sample size. Following the theoretical framework, we propose an empirical variance reduction framework, margin-based instance weighting, which weights training instances according to their importance to feature evaluation. Our extensive experimental study first verifies the theoretical and empirical frameworks based on synthetic data sets and a popular feature selection algorithm SVM-RFE. Experiments based on real-world microarray data sets further verify that the empirical framework is effective at reducing the variance and improving the subset stability of two representative feature selection algorithms, SVM-RFE and ReliefF, while maintaining comparable predictive accuracy based on the selected features. The proposed instance weighting framework is also shown to be more effective and efficient than the ensemble framework at improving the subset stability of the feature selection algorithms under study. © 2012 Wiley Periodicals, Inc.
引用
收藏
页码:428 / 445
相关论文
empty
未找到相关数据