A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data

被引:5
|
作者
Alromema, Nashwan [1 ]
Syed, Asif Hassan [1 ]
Khan, Tabrej [2 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Comp Sci, Jeddah 22254, Saudi Arabia
[2] King Abdulaziz Univ, Fac Comp & Informat Technol Rabigh FCITR, Dept Informat Syst, Jeddah 22254, Saudi Arabia
关键词
primary breast tumor; gene-biomarkers; hybrid-feature selection approach; filter-based fs; two-tailed unpaired t-test; meta-heuristics techniques; supervised machine learning classifiers; breast tumor prediction; FEATURE-SELECTION ALGORITHM; CANCER; PROTEIN; MAPK; OPTIMIZATION; BIOMARKER; RISK; ENAH;
D O I
10.3390/diagnostics13040708
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naive Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 +/- 0.027, an F1-Score of 0.974 +/- 0.030, and an AUC value of 0.961 +/- 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
引用
收藏
页数:31
相关论文
共 22 条
  • [1] A Hybrid BPSO-CGA Approach for Gene Selection and Classification of Microarray Data
    Chuang, Li-Yeh
    Yang, Cheng-Huei
    Li, Jung-Chike
    Yang, Cheng-Hong
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (01) : 68 - 82
  • [2] A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization
    Sharbaf, Fatemeh Vafaee
    Mosafer, Sara
    Moattar, Mohammad Hossein
    GENOMICS, 2016, 107 (06) : 231 - 238
  • [3] Deep learning techniques for cancer classification using microarray gene expression data
    Gupta, Surbhi
    Gupta, Manoj K.
    Shabaz, Mohammad
    Sharma, Ashutosh
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [4] Cancer Classification Based on Microarray Gene Expression Data Using Deep Learning
    Guillen, Pablo
    Ebalunode, Jerry
    2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 1403 - 1405
  • [5] Dissimilarity based ensemble of extreme learning machine for gene expression data classification
    Lu, Hui-juan
    An, Chun-lin
    Zheng, En-hui
    Lu, Yi
    NEUROCOMPUTING, 2014, 128 : 22 - 30
  • [6] Discriminant Projection Shared Dictionary Learning for Classification of Tumors Using Gene Expression Data
    Peng, Shaoliang
    Yang, Yaning
    Liu, Wei
    Li, Fei
    Liao, Xiangke
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (04) : 1464 - 1473
  • [7] A Review on Recent Progress in Machine Learning and Deep Learning Methods for Cancer Classification on Gene Expression Data
    Mazlan, Aina Umairah
    Sahabudin, Noor Azida
    Remli, Muhammad Akmal
    Ismail, Nor Syahidatul Nadiah
    Mohamad, Mohd Saberi
    Nies, Hui Wen
    Abd Warif, Nor Bakiah
    PROCESSES, 2021, 9 (08)
  • [8] Quantitative diagnosis of breast tumors by morphometric classification of microenvironmental myoepithelial cells using a machine learning approach
    Yamamoto, Yoichiro
    Saito, Akira
    Tateishi, Ayako
    Shimojo, Hisashi
    Kanno, Hiroyuki
    Tsuchiya, Shinichi
    Ito, Ken-ichi
    Cosatto, Eric
    Graf, Hans Peter
    Moraleda, Rodrigo R.
    Eils, Roland
    Grabe, Niels
    SCIENTIFIC REPORTS, 2017, 7
  • [9] Prediction of tumor purity from gene expression data using machine learning
    Koo, Bonil
    Rhee, Je-Keun
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (06)
  • [10] FCM-SVM-RFE gene feature selection algorithm for leukemia classification from microarray gene expression data
    Tang, YC
    Zhang, YQ
    Huang, Z
    FUZZ-IEEE 2005: PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS: BIGGEST LITTLE CONFERENCE IN THE WORLD, 2005, : 97 - 101