Online Feature Selection with Streaming Features

被引:213
作者
Wu, Xindong [1 ,2 ]
Yu, Kui [1 ]
Ding, Wei [3 ]
Wang, Hao [1 ]
Zhu, Xingquan [4 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[3] Univ Massachusetts, Dept Comp Sci, Coll Sci & Math, Boston, MA 02125 USA
[4] Univ Technol Sydney, Fac Engn & Informat Technol, Ctr Quantum Computat & Intelligent Syst, Broadway, NSW 2007, Australia
基金
美国国家科学基金会; 澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Feature selection; streaming features; supervised learning; CAUSAL DISCOVERY; CONSISTENCY;
D O I
10.1109/TPAMI.2012.197
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. We define streaming features as features that flow in one by one over time whereas the number of training examples remains fixed. This is in contrast with traditional online learning methods that only deal with sequentially added observations, with little attention being paid to streaming features. The critical challenges for Online Streaming Feature Selection (OSFS) include 1) the continuous growth of feature volumes over time, 2) a large feature space, possibly of unknown or infinite size, and 3) the unavailability of the entire feature set before learning starts. In the paper, we present a novel Online Streaming Feature Selection method to select strongly relevant and nonredundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. The proposed algorithms are evaluated extensively on high-dimensional datasets and also with a real-world case study on impact crater detection. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.
引用
收藏
页码:1178 / 1192
页数:15
相关论文
共 46 条
[1]  
Agresti A., 1990, CATEGORICAL DATA ANA
[2]  
Aliferis CF, 2010, J MACH LEARN RES, V11, P171
[3]  
[Anonymous], 2010, SPIDER A MATLAB MACH
[4]  
[Anonymous], 2008, P 14 ACM SIGKDD INT
[5]  
[Anonymous], 1994, Proceedings of the AAAI Fall Symposium on Relevance
[6]  
[Anonymous], 1994, MACHINE LEARNING P 1, DOI DOI 10.1016/B978-1-55860-335-6.50023-4
[7]   A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents [J].
Aphinyanaphongs, Yindalon ;
Statnikov, Alexander ;
Aliferis, Constantin F. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2006, 13 (04) :446-455
[8]  
Blake C. L., 1998, Uci repository of machine learning databases
[9]  
Bontempi G., 2010, PROC INTL CONF MACHI
[10]  
Clopinet, 2003, NIPS 2003