Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification

被引:81
作者
Gao, Lingyun [1 ]
Ye, Mingquan [1 ]
Lu, Xiaojie [1 ]
Huang, Daobin [1 ]
机构
[1] Wannan Med Coll, Sch Med Informat, Wuhu 241002, Peoples R China
基金
中国国家自然科学基金;
关键词
Gene selection; Cancer classification; Information gain; Support vector machine; Small sample size with high dimension; CONGENITAL MUSCULAR-DYSTROPHY; HEPSIN GENE; EXPRESSION; OPTIMIZATION; MUTATIONS; ALGORITHM; VARIANTS; INPP5K;
D O I
10.1016/j.gpb.2017.08.002
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B.
引用
收藏
页码:389 / 395
页数:7
相关论文
共 44 条
[1]   The guanylate cyclase-C signaling pathway is down-regulated in inflammatory bowel disease [J].
Brenna, Oystein ;
Bruland, Torunn ;
Furnes, Marianne W. ;
Granlund, Atle van Beelen ;
Drozdov, Ignat ;
Emgard, Johanna ;
Bronstad, Gunnar ;
Kidd, Mark ;
Sandvik, Arne K. ;
Gustafsson, Bjorn I. .
SCANDINAVIAN JOURNAL OF GASTROENTEROLOGY, 2015, 50 (10) :1241-1252
[2]   A survey on feature selection methods [J].
Chandrashekar, Girish ;
Sahin, Ferat .
COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) :16-28
[3]   Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm [J].
Chen, Kun-Huang ;
Wang, Kung-Jeng ;
Tsai, Min-Lung ;
Wang, Kung-Min ;
Adrian, Angelia Melani ;
Cheng, Wei-Chung ;
Yang, Tzu-Sen ;
Teng, Nai-Chia ;
Tan, Kuo-Pin ;
Chang, Ku-Shang .
BMC BIOINFORMATICS, 2014, 15
[4]   The mitochondrial ADP/ATP carrier (SLC25 family): Pathological implications of its dysfunction [J].
Clemencon, Benjamin ;
Babot, Marion ;
Trezeguet, Veronique .
MOLECULAR ASPECTS OF MEDICINE, 2013, 34 (2-3) :485-493
[5]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[6]   Analysis of genomic variation in lung adenocarcinoma patients revealed the critical role of PI3K complex [J].
Deng, Zhao min ;
Liu, Lin ;
Qiu, Wen hai ;
Zhang, Yong qun ;
Zhong, Hong yan ;
Liao, Ping ;
Wu, Yun hong .
PEERJ, 2017, 5
[7]   A two-stage gene selection scheme utilizing MRMR filter and GA wrapper [J].
El Akadi, Ali ;
Amine, Aouatif ;
El Ouardighi, Abdeljalil ;
Aboutajdine, Driss .
KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 26 (03) :487-500
[8]   Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckcio optimization algorithm and harmony search for cancer classification [J].
Elyasigomari, V. ;
Lee, D. A. ;
Screen, H. R. C. ;
Shaheed, M. H. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 67 :11-20
[9]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914
[10]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537