Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification

被引:282
作者
Jain, Indu [1 ]
Jain, Vinod Kumar [2 ]
Jain, Renu [1 ]
机构
[1] Jiwaji Univ, Sch Math & Allied Sci, Gwalior 474006, MP, India
[2] PDPM Indian Inst Informat Technol Design & Mfg, Dumna Airport Rd,PO Khamaria, Jabalpur, MP, India
关键词
Microarray data analysis; Cancer classification; Improved Binary Particle Swarm Optimization (iBPSO); Hybrid model; Gene selection; Naive-Bayes; TUMOR; PREDICTION; DIAGNOSIS; PATTERNS;
D O I
10.1016/j.asoc.2017.09.038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA microarray technology has emerged as a prospective tool for diagnosis of cancer and its classification. It provides better insights of many genetic mutations occurring within a cell associated with cancer. However, thousands of gene expressions measured for each biological sample using microarray pose a great challenge. Many statistical and machine learning methods have been applied to get most relevant genes prior to cancer classification. A two phase hybrid model for cancer classification is being proposed, integrating Correlation-based Feature Selection (CFS) with improved-Binary Particle Swarm Optimization (iBPSO). This model selects a low dimensional set of prognostic genes to classify biological samples of binary and multi class cancers using NaiveBayes classifier with stratified 10-fold cross-validation. The proposed iBPSO also controls the problem of early convergence to the local optimum of traditional BPSO. The proposed model has been evaluated on 11 benchmark microarray datasets of different cancer types. Experimental results are compared with seven other well known methods, and our model exhibited better results in terms of classification accuracy and the number of selected genes in most cases. In particular, it achieved up to 100% classification accuracy for seven out of eleven datasets with a very small sized prognostic gene subset (up to <1.5%) for all eleven datasets. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:203 / 215
页数:13
相关论文
共 46 条
[1]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[2]   Selection of relevant features and examples in machine learning [J].
Blum, AL ;
Langley, P .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :245-271
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Analysis of gene expression profiles and drug activity patterns by clustering and Bayesian network learning [J].
Chang, JH ;
Hwang, KB ;
Zhang, BT .
METHODS OF MICROARRAY DATA ANALYSIS II, 2002, :169-184
[5]   Applying particle swarm optimization-based decision tree classifier for cancer classification on gene expression data [J].
Chen, Kun-Huang ;
Wang, Kung-Jeng ;
Wang, Kung-Min ;
Angelia, Melani-Adrian .
APPLIED SOFT COMPUTING, 2014, 24 :773-780
[6]   Gene selection and classification using Taguchi chaotic binary particle swarm optimization [J].
Chuang, Li-Yeh ;
Yang, Cheng-San ;
Wu, Kuo-Chuan ;
Yang, Cheng-Hong .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (10) :13367-13377
[7]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[8]   Particle swarm optimization: Basic concepts, variants and applications in power systems [J].
del Valle, Yamille ;
Venayagamoorthy, Ganesh Kumar ;
Mohagheghi, Salman ;
Hernandez, Jean-Carlos ;
Harley, Ronald G. .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2008, 12 (02) :171-195
[9]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528
[10]  
Eberhart R., 1995, MHS95 P 6 INT S MICR, DOI [DOI 10.1109/MHS.1995.494215, 10.1109/MHS.1995.494215]