Can-Evo-Ens: Classifier stacking based evolutionary ensemble system for prediction of human breast cancer using amino acid sequences

被引:23
作者
Ali, Safdar [1 ]
Majid, Abdul [1 ]
机构
[1] Pakistan Inst Engn & Appl Sci, Dept Comp & Informat Sci, Islamabad 45650, Pakistan
关键词
Breast cancer; Amino acids; Physicochemical properties; Genetic programming; Stacking ensemble; PROTEIN; FUSION;
D O I
10.1016/j.jbi.2015.01.004
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The diagnostic of human breast cancer is an intricate process and specific indicators may produce negative results. In order to avoid misleading results, accurate and reliable diagnostic system for breast cancer is indispensable. Recently, several interesting machine-learning (ML) approaches are proposed for prediction of breast cancer. To this end, we developed a novel classifier stacking based evolutionary ensemble system "Can-Evo-Ens" for predicting amino acid sequences associated with breast cancer. In this paper, first, we selected four diverse-type of ML algorithms of Naive Bayes, K-Nearest Neighbor, Support Vector Machines, and Random Forest as base-level classifiers. These classifiers are trained individually in different feature spaces using physicochemical properties of amino acids. In order to exploit the decision spaces, the preliminary predictions of base-level classifiers are stacked. Genetic programming (GP) is then employed to develop a meta-classifier that optimal combine the predictions of the base classifiers. The most suitable threshold value of the best-evolved predictor is computed using Particle Swarm Optimization technique. Our experiments have demonstrated the robustness of Can-Evo-Ens system for independent validation dataset. The proposed system has achieved the highest value of Area Under Curve (AUC) of ROC Curve of 99.95% for cancer prediction. The comparative results revealed that proposed approach is better than individual ML approaches and conventional ensemble approaches of AdaBoostMl, Bagging, GentleBoost, and Random Subspace. It is expected that the proposed novel system would have a major impact on the fields of Biomedical, Genomics, Proteomics, Bioinformatics, and Drug Development. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:256 / 269
页数:14
相关论文
共 53 条
[41]   Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices [J].
Munteanu, Cristian Robert ;
Magalhaes, Alexandre L. ;
Uriarte, Eugenio ;
Gonzalez-Diaz, Humberto .
JOURNAL OF THEORETICAL BIOLOGY, 2009, 257 (02) :303-311
[42]   A fuzzy-genetic approach to breast cancer diagnosis [J].
Peña-Reyes, CA ;
Sipper, M .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 1999, 17 (02) :131-155
[43]   Improved Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins Using Data Mining Models [J].
Ramani, R. Geetha ;
Jacob, Shomona Gracia .
PLOS ONE, 2013, 8 (03)
[44]   Ensemble classification of colon biopsy images based on information rich hybrid features [J].
Rathore, Saima ;
Hussain, Mutawarra ;
Iftikhar, Muhammad Aksam ;
Jalil, Abdul .
COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 47 :76-92
[45]   A review of feature selection techniques in bioinformatics [J].
Saeys, Yvan ;
Inza, Inaki ;
Larranaga, Pedro .
BIOINFORMATICS, 2007, 23 (19) :2507-2517
[46]   The consensus coding sequences of human breast and colorectal cancers [J].
Sjoeblom, Tobias ;
Jones, Sian ;
Wood, Laura D. ;
Parsons, D. Williams ;
Lin, Jimmy ;
Barber, Thomas D. ;
Mandelker, Diana ;
Leary, Rebecca J. ;
Ptak, Janine ;
Silliman, Natalie ;
Szabo, Steve ;
Buckhaults, Phillip ;
Farrell, Christopher ;
Meeh, Paul ;
Markowitz, Sanford D. ;
Willis, Joseph ;
Dawson, Dawn ;
Willson, James K. V. ;
Gazdar, Adi F. ;
Hartigan, James ;
Wu, Leo ;
Liu, Changsheng ;
Parmigiani, Giovanni ;
Park, Ben Ho ;
Bachman, Kurtis E. ;
Papadopoulos, Nickolas ;
Vogelstein, Bert ;
Kinzler, Kenneth W. ;
Velculescu, Victor E. .
SCIENCE, 2006, 314 (5797) :268-274
[47]  
Ster B., 1996, Solving Engineering Problems with Neural Networks. Proceedings of the International Conference on Engineering Applications of Neural Networks (EANN'96), P427
[48]   Modeling medical decision making by support vector machines, explaining by rules of evolutionary algorithms with feature selection [J].
Stoean, Ruxandra ;
Stoean, Catalin .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (07) :2677-2686
[49]   Combined Feature Selection and Cancer Prognosis Using Support Vector Machine Regression [J].
Sun, Bing-Yu ;
Zhu, Zhi-Hua ;
Li, Jiuyong ;
Bin Linghu .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (06) :1671-1677
[50]   Protein subcellular localization of fluorescence imagery using spatial and transform domain features [J].
Tahir, Muhammad ;
Khan, Asifullah ;
Majid, Abdul .
BIOINFORMATICS, 2012, 28 (01) :91-97