A novel gene selection algorithm for cancer classification using microarray datasets

被引:53
作者
Alanni, Russul [1 ]
Hou, Jingyu [1 ]
Azzawi, Hasseeb [1 ]
Xiang, Yong [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Burwood, Vic 3125, Australia
关键词
Gene selection; Gene expression programming; Support vector machine; Microarray cancer dataset; PARTICLE SWARM OPTIMIZATION; MOLECULAR CLASSIFICATION; EXPRESSION; PREDICTION; CARCINOMAS; PROJECTION; REDUCTION;
D O I
10.1186/s12920-018-0447-6
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
BackgroundMicroarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results.MethodsAn innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP.ResultsExperimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods.ConclusionGene subset selected by GSP can achieve a higher classification accuracy with less processing time.
引用
收藏
页数:12
相关论文
共 50 条
[1]   Prediction of NSCLC recurrence from microarray data with GEP [J].
Al-Anni, Russul ;
Hou, Jingyu ;
Abdu-aljabar, Rana Dhia'a ;
Xiang, Yong .
IET SYSTEMS BIOLOGY, 2017, 11 (03) :77-85
[2]   New Gene Selection Method Using Gene Expression Programing Approach on Microarray Data Sets [J].
Alanni, Russul ;
Hou, Jingyu ;
Azzawi, Hasseeb ;
Xiang, Yong .
COMPUTER AND INFORMATION SCIENCE (ICIS 2018), 2019, 791 :17-31
[3]   mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling [J].
Alshamlan, Hala ;
Badr, Ghada ;
Alohali, Yousef .
BIOMED RESEARCH INTERNATIONAL, 2015, 2015
[4]   Gene Selection for Microarray Cancer Data Classification by a Novel Rule-Based Algorithm [J].
Angulo, Adrian Pino .
INFORMATION, 2018, 9 (01)
[5]  
[Anonymous], P IEEE S COMP INT SE
[6]  
[Anonymous], 1989, GENETIC ALGORITHMS S
[7]  
[Anonymous], 2014, INT J BIOSCI BIOCH B
[8]  
[Anonymous], 1997, ICML
[9]  
[Anonymous], 1991, F GENETIC ALGORITHMS
[10]   MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia [J].
Armstrong, SA ;
Staunton, JE ;
Silverman, LB ;
Pieters, R ;
de Boer, ML ;
Minden, MD ;
Sallan, SE ;
Lander, ES ;
Golub, TR ;
Korsmeyer, SJ .
NATURE GENETICS, 2002, 30 (01) :41-47