Feature selection and analysis on correlated gas sensor data with recursive feature elimination

被引:367
作者
Yan, Ke [1 ]
Zhang, David [2 ]
机构
[1] Tsinghua Univ, Grad Sch Shenzhen, Dept Elect Engn, Shenzhen 518055, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Biometr Res Ctr, Kowloon, Hong Kong, Peoples R China
来源
SENSORS AND ACTUATORS B-CHEMICAL | 2015年 / 212卷
关键词
Feature selection; Feature ranking; SVM-RFE; Correlation bias; Breath analysis; Transient feature; SUPPORT VECTOR MACHINES; BREATH ANALYSIS SYSTEM; SOCIAL IMPACT THEORY; ELECTRONIC NOSE; GENE SELECTION; SVM-RFE; CANCER CLASSIFICATION; VARIABLE SELECTION; FEATURE-EXTRACTION; EXPRESSION DATA;
D O I
10.1016/j.snb.2015.02.025
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Support vector machine recursive feature elimination (SVM-RFE) is a powerful feature selection algorithm. However, when the candidate feature set contains highly correlated features, the ranking criterion of SVM-RFE will be biased, which would hinder the application of SVM-RFE on gas sensor data. In this paper, the linear and nonlinear SVM-RFE algorithms are studied. After investigating the correlation bias, an improved algorithm SVM-RFE + CBR is proposed by incorporating the correlation bias reduction (CBR) strategy into the feature elimination procedure. Experiments are conducted on a synthetic dataset and two breath analysis datasets, one of which contains temperature modulated sensors. Large and comprehensive sets of transient features are extracted from the sensor responses. The classification accuracy with feature selection proves the efficacy of the proposed SVM-RFE + CBR. It outperforms the original SVM-RFE and other typical algorithms. An ensemble method is further studied to improve the stability of the proposed method. By statistically analyzing the features' rankings, some knowledge is obtained, which can guide future design of e-noses and feature extraction algorithms. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:353 / 363
页数:11
相关论文
共 35 条
[1]  
Awada W, 2012, 2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), P356, DOI 10.1109/IRI.2012.6303031
[2]   A novel approach using Dynamic Social Impact Theory for optimization of impedance-Tongue (iTongue) [J].
Bhondekar, Amol P. ;
Kaur, Rishemjit ;
Kumar, Ritesh ;
Vig, Renu ;
Kapur, Pawan .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 109 (01) :65-76
[3]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[4]   Decision tree approach for classification and dimensionality reduction of electronic nose data [J].
Cho, Jung Hwan ;
Kurup, Pradeep U. .
SENSORS AND ACTUATORS B-CHEMICAL, 2011, 160 (01) :542-548
[5]   Olfactory systems for medical applications [J].
D'Amico, A. ;
Di Natale, C. ;
Paolesse, R. ;
Macagnano, A. ;
Martinelli, E. ;
Pennazza, G. ;
Santonico, A. ;
Bernabei, M. ;
Roscioni, C. ;
Galluccio, G. ;
Bono, R. ;
Finazzi Agro, E. ;
Rullo, S. .
SENSORS AND ACTUATORS B-CHEMICAL, 2008, 130 (01) :458-465
[6]   Multiple SVM-RFE for gene selection in cancer classification with expression data [J].
Duan, KB ;
Rajapakse, JC ;
Wang, HY ;
Azuaje, F .
IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2005, 4 (03) :228-234
[7]   Variable selection for support vector machine based multisensor systems [J].
Gualdron, O. ;
Brezmes, J. ;
Llobet, E. ;
Amari, A. ;
Vilanova, X. ;
Bouchikhi, B. ;
Correig, X. .
SENSORS AND ACTUATORS B-CHEMICAL, 2007, 122 (01) :259-268
[8]   A Novel Breath Analysis System Based on Electronic Olfaction [J].
Guo, Dongmin ;
Zhang, David ;
Li, Naimin ;
Zhang, Lei ;
Yang, Jianhua .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2010, 57 (11) :2753-2763
[9]   Transient response analysis for temperature-modulated chemoresistors [J].
Gutierrez-Osuna, R ;
Gutierrez-Galvez, A ;
Powar, N .
SENSORS AND ACTUATORS B-CHEMICAL, 2003, 93 (1-3) :57-66
[10]   Gene selection for cancer classification using support vector machines [J].
Guyon, I ;
Weston, J ;
Barnhill, S ;
Vapnik, V .
MACHINE LEARNING, 2002, 46 (1-3) :389-422