Feature generation using genetic programming with comparative partner selection for diabetes classification

被引:59
作者
Aslam, Muhammad Waqar [1 ]
Zhu, Zhechen [2 ]
Nandi, Asoke Kumar [2 ,3 ]
机构
[1] Univ Liverpool, Dept Elect Engn & Elect, Liverpool L69 3GJ, Merseyside, England
[2] Brunel Univ, Dept Elect & Comp Engn, Uxbridge UB8 3PH, Middx, England
[3] Univ Jyvaskyla, Dept Math Informat Technol, FI-40014 Jyvaskyla, Finland
关键词
Pima Indian diabetes; Genetic programming; Comparative partner selection; EXPERT-SYSTEM; DIAGNOSIS; EXTRACTION; DESIGN;
D O I
10.1016/j.eswa.2013.04.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ultimate aim of this research is to facilitate the diagnosis of diabetes, a rapidly increasing disease in the world. In this research a genetic programming (GP) based method has been used for diabetes classification. GP has been used to generate new features by making combinations of the existing diabetes features, without prior knowledge of the probability distribution. The proposed method has three stages: features selection is performed at the first stage using t-test, Kolmogorov-Smirnov test, Kullback-Leibler divergence test, F-score selection, and GP. The results of feature selection methods are used to prepare an ordered list of original features where features are arranged in decreasing order of importance. Different subsets of original features are prepared by adding features one by one in each subset using sequential forward selection method according to the ordered list. At the second stage, GP is used to generate new features from each subset of original diabetes features, by making non-linear combinations of the original features. A variation of GP called GP with comparative partner selection (GP-CPS), utilising the strengths and the weaknesses of GP generated features, has been used at the second stage. The performance of GP generated features for classification is tested using the k-nearest neighbor and support vector machine classifiers at the last stage. The results and their comparisons with other methods demonstrate that the proposed method exhibits superior performance over other recent methods. (c) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5402 / 5412
页数:11
相关论文
共 34 条
  • [1] [Anonymous], P 33 S INT COMP SCI
  • [2] Aslam MW, 2010, EUR SIGNAL PR CONF, P1184
  • [3] Brameier M., 2001, E vol. Comput. IEEE Trans, V5, P1726
  • [4] Dash M., 1997, Intelligent Data Analysis, V1
  • [5] Binary String Fitness Characterization and Comparative Partner Selection in Genetic Programming
    Day, Peter
    Nandi, Asoke K.
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2008, 12 (06) : 724 - 735
  • [6] Eggermont J, 1999, LECT NOTES COMPUT SC, V1642, P281
  • [7] A Survey on the Application of Genetic Programming to Classification
    Espejo, Pedro G.
    Ventura, Sebastian
    Herrera, Francisco
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2010, 40 (02): : 121 - 144
  • [8] Frank A., 2010, UCI machine learning repository, V213
  • [9] An interpretable fuzzy rule-based classification methodology for medical diagnosis
    Gadaras, Ioannis
    Mikhailov, Ludmil
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2009, 47 (01) : 25 - 41
  • [10] Inverted hierarchical neuro-fuzzy BSP system:: A novel neuro-fuzzy model for pattern classification and rule extraction in databases
    Gonçalves, LB
    Vellasco, MMBR
    Pacheco, MAC
    de Souza, FJ
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2006, 36 (02): : 236 - 248