Efficient Bioinspired Feature Selection and Machine Learning Based Framework Using Omics Data and Biological Knowledge Data Bases in Cancer Clinical Endpoint Prediction

被引:8
作者
Zenbout, Imene [1 ,2 ]
Bouramoul, Abdelkrim
Meshoul, Souham [3 ]
Amrane, Mounira [4 ]
机构
[1] Constantine 2 Univ Abdelhamid Mehri, Dept Fundamental Informat & Its Applicat, MISC Lab, Constantine, Algeria
[2] Natl Biotechnol Res Ctr CRBT, Constantine 25000, Algeria
[3] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Technol, Riyadh, Saudi Arabia
[4] Univ Ferhat Abbas Set 1, Set Univ Hosp, Biochem Dept, Setif, Algeria
关键词
Biomarker discovery; Integrative omics; cancer classification; deep canonical correlation analysis; enrichment analysis; feature selection; grey wolf optimization; machine learning; FUNCTIONAL PROTEOMICS; BIOMARKER DISCOVERY; EXPRESSION; DIAGNOSIS; PATHWAY;
D O I
10.1109/ACCESS.2023.3234294
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cancer Research has advanced during the past few years. Using high throughput technology and advances in artificial intelligence, it is now possible to improve cancer diagnosis and targeted therapy, by integrating the investigation and analysis of clinical and omics profiles. The high dimensionality and class imbalance of the majority of available data sets represent a serious challenge to the development of computational methods and tools for cancer diagnosis and biomarker discovery. Taking into account multi-omics data further complicates the undertaking. In this paper, we describe a five-step integrative architecture for dealing with the three aforementioned problems by incorporating proteomics data, protein-protein interaction networks, and signaling pathways in order to identify protein biomarkers with a direct association to cancerous patients' overall survival (OS) and progression free interval (PFI). The core parts of this architecture are a cluster based grey wolf optimization algorithm (CB-GWO) for feature selection and a deep stacked canonical correlation autoencoder (DSCC-AE) for clinical endpoint prediction. A thorough experimental study was carried out to evaluate the performance of the proposed optimization algorithm for feature selection, as well as the performance of the deep learning model in terms of Mathew coefficient correlation (MCC) and Area under the curve (AUC) on breast, lung, colon, and rectum cancers. The results were compared to other methods in the literature. The results are very promising and show the effectiveness of the proposed framework and its ability to outperform the other algorithms and models in terms of AUC (0.91) and MCC (0.64). In addition, hub marker genes with the potential occurence of alterations in colorectal cancer, breast cancer, and lung cancer have been identified.
引用
收藏
页码:2674 / 2699
页数:26
相关论文
共 101 条
[1]  
Andrienko G., 2013, Introduction, P1
[2]  
[Anonymous], 2013, Empirical Inference, DOI [DOI 10.1007/978-3-642-41136-65, 10.1007/978-3-642-41136-6_5, DOI 10.1007/978-3-642-41136-6_5]
[3]   Graph-based relevancy-redundancy gene selection method for cancer diagnosis [J].
Azadifar, Saeid ;
Rostami, Mehrdad ;
Berahmand, Kamal ;
Moradi, Parham ;
Oussalah, Mourad .
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 147
[4]  
Bindal N, 2011, GENOME BIOL, V12, P5
[5]   Artificial Intelligence (AI)-Based Systems Biology Approaches in Multi-Omics Data Analysis of Cancer [J].
Biswas, Nupur ;
Chakrabarti, Saikat .
FRONTIERS IN ONCOLOGY, 2020, 10
[6]   β-catenin S45F mutation results in apoptotic resistance [J].
Braggio, Danielle ;
Zewdu, Abeba ;
Londhe, Priya ;
Yu, Peter ;
Lopez, Gonzalo ;
Batte, Kara ;
Koller, David ;
de Faria, Fernanda Costas Casal ;
Casadei, Lucia ;
Strohecker, Anne M. ;
Lev, Dina ;
Pollock, Raphael E. .
ONCOGENE, 2020, 39 (34) :5589-5600
[7]   Prognostic and Predictive Implications of PTEN in Breast Cancer: Unfulfilled Promises but Intriguing Perspectives [J].
Carbognin, Luisa ;
Miglietta, Federica ;
Paris, Ida ;
Dieci, Maria Vittoria .
CANCERS, 2019, 11 (09)
[8]   The Gene Ontology resource: enriching a GOld mine [J].
Carbon, Seth ;
Douglass, Eric ;
Good, Benjamin M. ;
Unni, Deepak R. ;
Harris, Nomi L. ;
Mungall, Christopher J. ;
Basu, Siddartha ;
Chisholm, Rex L. ;
Dodson, Robert J. ;
Hartline, Eric ;
Fey, Petra ;
Thomas, Paul D. ;
Albou, Laurent-Philippe ;
Ebert, Dustin ;
Kesling, Michael J. ;
Mi, Huaiyu ;
Muruganujan, Anushya ;
Huang, Xiaosong ;
Mushayahama, Tremayne ;
LaBonte, Sandra A. ;
Siegele, Deborah A. ;
Antonazzo, Giulia ;
Attrill, Helen ;
Brown, Nick H. ;
Garapati, Phani ;
Marygold, Steven J. ;
Trovisco, Vitor ;
Dos Santos, Gil ;
Falls, Kathleen ;
Tabone, Christopher ;
Zhou, Pinglei ;
Goodman, Joshua L. ;
Strelets, Victor B. ;
Thurmond, Jim ;
Garmiri, Penelope ;
Ishtiaq, Rizwan ;
Rodriguez-Lopez, Milagros ;
Acencio, Marcio L. ;
Kuiper, Martin ;
Laegreid, Astrid ;
Logie, Colin ;
Lovering, Ruth C. ;
Kramarz, Barbara ;
Saverimuttu, Shirin C. C. ;
Pinheiro, Sandra M. ;
Gunn, Heather ;
Su, Renzhi ;
Thurlow, Katherine E. ;
Chibucos, Marcus ;
Giglio, Michelle .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D325-D334
[9]   Metabolomics for personalized medicine: the input of analytical chemistry from biomarker discovery to point-of-care tests [J].
Castelli, Florence Anne ;
Rosati, Giulio ;
Moguet, Christian ;
Fuentes, Celia ;
Marrugo-Ramirez, Jose ;
Lefebvre, Thibaud ;
Volland, Herve ;
Merkoci, Arben ;
Simon, Stephanie ;
Fenaille, Francois ;
Junot, Christophe .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2022, 414 (02) :759-789
[10]   miRNet 2.0: network-based visual analytics for miRNA functional analysis and systems biology [J].
Chang, Le ;
Zhou, Guangyan ;
Soufan, Othman ;
Xia, Jianguo .
NUCLEIC ACIDS RESEARCH, 2020, 48 (W1) :W244-W251