A new feature selection method on classification of medical datasets: Kernel F-score feature selection

被引:129
作者
Polat, Kemal [1 ]
Gunes, Salih [1 ]
机构
[1] Selcuk Univ, Dept Elect & Elect Engn, TR-42075 Konya, Turkey
关键词
Feature selection; Kernel F-score feature selection; Least Square Support Vector Machine (LS-SVM); Levenberg-Marquardt Artificial Neural Network; Heart disease dataset; SPECT images dataset; Escherichia coli Promoter Gene Sequence dataset; IMMUNE RECOGNITION SYSTEM; FUZZY; AIRS;
D O I
10.1016/j.eswa.2009.01.041
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we have proposed a new feature selection method called kernel F-score feature selection (KFFS) used as pre-processing step in the classification of medical datasets. KFFS consists of two phases. In the first phase, input spaces (features) of medical datasets have been transformed to kernel space by means of Linear (Lin) or Radial Basis Function (RBF) kernel functions. By this way, the dimensions of medical datasets have increased to high dimension feature space. In the second phase, the F-score values of medical datasets with high dimensional feature space have been calculated using F-score formula. And then the mean value of calculated F-scores has been computed. If the F-score value of any feature in medical datasets is bigger than this mean value, that feature will be selected. Otherwise, that feature is removed from feature space. Thanks to KFFS method, the irrelevant or redundant features are removed from high dimensional input feature space. The cause of using kernel functions transforms from non-linearly separable medical dataset to a linearly separable feature space. In this study, we have used the heart disease dataset, SPECT (Single Photon Emission Computed Tomography) images dataset, and Escherichia coli Promoter Gene Sequence dataset taken from UCI (University California, Irvine) machine learning database to test the performance of KFFS method. As classification algorithms, Least Square Support Vector Machine (LS-SVM) and Levenberg-Marquardt Artificial Neural Network have been used. As shown in the obtained results, the proposed feature selection method called KFFS is produced very promising results compared to F-score feature selection. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:10367 / 10373
页数:7
相关论文
共 22 条
[1]   Selecting salient features for classification based on neural network committees [J].
Bacauskiene, M ;
Verikas, A .
PATTERN RECOGNITION LETTERS, 2004, 25 (16) :1879-1891
[2]  
BAKIRCI U, 2004, DIAGNOSIS CARDIAC PR, P103
[3]   Generalized discriminant analysis using a kernel approach [J].
Baudat, G ;
Anouar, FE .
NEURAL COMPUTATION, 2000, 12 (10) :2385-2404
[4]  
CAO B, 2007, ICML 07, P121
[5]  
CHEN YW, 2003, COMBINING SVMS VARIO, P1
[6]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[7]  
DEVIJVER PA, 1982, PATTERN RECOGNITION
[8]  
GEOFFREY GT, 1990, P 8 NAT C ART INT, P861
[9]   Design of a hybrid system for the diabetes and heart diseases [J].
Kahramanli, Humar ;
Allahverdi, Novruz .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (1-2) :82-89
[10]   Knowledge discovery approach to automated cardiac SPECT diagnosis [J].
Kurgan, LA ;
Cios, KJ ;
Tadeusiewicz, R ;
Ogiela, M ;
Goodenday, LS .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2001, 23 (02) :149-169