A framework for cost-based feature selection

被引:60
作者
Bolon-Canedo, V. [1 ]
Porto-Diaz, I. [1 ]
Sanchez-Marono, N. [1 ]
Alonso-Betanzos, A. [1 ]
机构
[1] Univ A Coruna, Lab Res & Dev Artificial Intelligence LIDIA, Dept Comp Sci, La Coruna 15071, Spain
关键词
Cost-based feature selection; Machine learning; Filter methods; NEURAL-NETWORKS;
D O I
10.1016/j.patcog.2014.01.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the last few years, the dimensionality of datasets involved in data mining applications has increased dramatically. In this situation, feature selection becomes indispensable as it allows for dimensionality reduction and relevance detection. The research proposed in this paper broadens the scope of feature selection by taking into consideration not only the relevance of the features but also their associated costs. A new general framework is proposed, which consists of adding a new term to the evaluation function of a filter feature selection method so that the cost is taken into account. Although the proposed methodology could be applied to any feature selection filter, in this paper the approach is applied to two representative filter methods: Correlation-based Feature Selection (CFS) and Minimal-Redundancy-Maximal-Relevance (mRMR), as an example of use. The behavior of the proposed framework is tested on 17 heterogeneous classification datasets, employing a Support Vector Machine (SVM) as a classifier. The results of the experimental study show that the approach is sound and that it allows the user to reduce the cost without compromising the classification error. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2481 / 2489
页数:9
相关论文
共 26 条
[1]   Support vector machines combined with feature selection for breast cancer diagnosis [J].
Akay, Mehmet Fatih .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :3240-3247
[2]  
[Anonymous], UCI MACHINE LEARNING
[3]  
[Anonymous], 1991, Artificial Intelligence
[4]  
[Anonymous], 1987, Multiple comparison procedures
[5]  
Bahamonde A., 2004, P 21 INT C MACH LEAR, P49
[6]   An ensemble of filters and classifiers for microarray data classification [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
PATTERN RECOGNITION, 2012, 45 (01) :531-539
[7]   Feature selection and classification in multiple class datasets: An application to KDD Cup 99 dataset [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) :5947-5957
[8]   A review of feature selection methods on synthetic data [J].
Bolon-Canedo, Veronica ;
Sanchez-Marono, Noelia ;
Alonso-Betanzos, Amparo .
KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) :483-519
[9]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528
[10]   WEIGHTED SELECTION OF IMAGE FEATURES FOR RESOLVED RATE VISUAL FEEDBACK-CONTROL [J].
FEDDEMA, JT ;
LEE, CSG ;
MITCHELL, OR .
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1991, 7 (01) :31-47