A deep learning model predicts the presence of diverse cancer types using circulating tumor cells

被引:2
作者
Albaradei, Somayah [1 ]
Alganmi, Nofe [1 ,2 ]
Albaradie, Abdulrahman [3 ]
Alharbi, Eaman
Motwalli, Olaa [4 ]
Thafar, Maha A. [5 ]
Gojobori, Takashi [6 ,7 ]
Essack, Magbubah [6 ,7 ]
Gao, Xin [6 ,7 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 80200, Saudi Arabia
[2] King Abdulaziz Univ, Ctr Excellence Genom Med Res, Jeddah 21589, Saudi Arabia
[3] Al Hada Armed Forces Hosp, Taif, Saudi Arabia
[4] Saudi Elect Univ SEU, Coll Comp & Informat, Madinah, Saudi Arabia
[5] Taif Univ, Coll Comp & Informat Technol, Taif, Saudi Arabia
[6] King Abdullah Univ Sci & Technol KAUST, Computat Biosci Res Ctr CBRC, Thuwal, Saudi Arabia
[7] King Abdullah Univ Sci & Technol KAUST, Comp Elect & Math Sci & Engn Div CEMSE, Comp Sci Program, Thuwal, Saudi Arabia
关键词
PROLIFERATION; METASTASIS; INVASION; MIR-15A;
D O I
10.1038/s41598-023-47805-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Circulating tumor cells (CTCs) are cancer cells that detach from the primary tumor and intravasate into the bloodstream. Thus, non-invasive liquid biopsies are being used to analyze CTC-expressed genes to identify potential cancer biomarkers. In this regard, several studies have used gene expression changes in blood to predict the presence of CTC and, consequently, cancer. However, the CTC mRNA data has not been used to develop a generic approach that indicates the presence of multiple cancer types. In this study, we developed such a generic approach. Briefly, we designed two computational workflows, one using the raw mRNA data and deep learning (DL) and the other exploiting five hub gene ranking algorithms (Degree, Maximum Neighborhood Component, Betweenness Centrality, Closeness Centrality, and Stress Centrality) with machine learning (ML). Both workflows aim to determine the top genes that best distinguish cancer types based on the CTC mRNA data. We demonstrate that our automated, robust DL framework (DNNraw) more accurately indicates the presence of multiple cancer types using the CTC gene expression data than multiple ML approaches. The DL approach achieved average precision of 0.9652, recall of 0.9640, f1-score of 0.9638 and overall accuracy of 0.9640. Furthermore, since we designed multiple approaches, we also provide a bioinformatics analysis of the gene commonly identified as top-ranked by the different methods. To our knowledge, this is the first study wherein a generic approach has been developed to predict the presence of multiple cancer types using raw CTC mRNA data, as opposed to other models that require a feature selection step.
引用
收藏
页数:14
相关论文
共 59 条
  • [1] Circulating Tumor Cell Clusters Are Oligoclonal Precursors of Breast Cancer Metastasis
    Aceto, Nicola
    Bardia, Aditya
    Miyamoto, David T.
    Donaldson, Maria C.
    Wittner, Ben S.
    Spencer, Joel A.
    Yu, Min
    Pely, Adam
    Engstrom, Amanda
    Zhu, Huili
    Brannigan, Brian W.
    Kapur, Ravi
    Stott, Shannon L.
    Shioda, Toshi
    Ramaswamy, Sridhar
    Ting, David T.
    Lin, Charles P.
    Toner, Mehmet
    Haber, Daniel A.
    Maheswaran, Shyamala
    [J]. CELL, 2014, 158 (05) : 1110 - 1122
  • [2] MetastaSite: Predicting metastasis to different sites using deep learning with gene expression data
    Albaradei, Somayah
    Albaradei, Abdurhman
    Alsaedi, Asim
    Uludag, Mahmut
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. FRONTIERS IN MOLECULAR BIOSCIENCES, 2022, 9
  • [3] Predicting Bone Metastasis Using Gene Expression-Based Machine Learning Models
    Albaradei, Somayah
    Uludag, Mahmut
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [4] Machine learning and deep learning methods that use omics data for metastasis prediction
    Albaradei, Somayah
    Thafar, Maha
    Alsaedi, Asim
    Van Neste, Christophe
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 5008 - 5018
  • [5] MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data
    Albaradei, Somayah
    Napolitano, Francesco
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 4404 - 4411
  • [6] [Anonymous], 2022, The Circulating Cell-free Genome Atlas Study-Full Text View
  • [7] CDK1 and HSP9OAA1 Appear as the Novel Regulatory Genes in Non-Small Cell Lung Cancer: A Bioinformatics Approach
    Bhattacharyya, Nirjhar
    Gupta, Samriddhi
    Sharma, Shubham
    Soni, Aman
    Bagabir, Sali Abubaker
    Bhattacharyya, Malini
    Mukherjee, Atreyee
    Almalki, Atiah H.
    Alkhanani, Mustfa F.
    Haque, Shafiul
    Ray, Ashwini Kumar
    Malik, Md Zubbair
    [J]. JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (03):
  • [8] Chang Le, 2023, Methods Mol Biol, V2594, P185, DOI 10.1007/978-1-0716-2815-7_14
  • [9] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [10] Enrichr: interactive and collaborative HTML']HTML5 gene list enrichment analysis tool
    Chen, Edward Y.
    Tan, Christopher M.
    Kou, Yan
    Duan, Qiaonan
    Wang, Zichen
    Meirelles, Gabriela Vaz
    Clark, Neil R.
    Ma'ayan, Avi
    [J]. BMC BIOINFORMATICS, 2013, 14