Identifying the tumor location-associated candidate genes in development of new drugs for colorectal cancer using machine-learning-based approach

被引:1
|
作者
Bayrak, Tuncay [1 ]
Cetin, Zafer [2 ,3 ]
Saygili, E. Ilker [4 ,5 ]
Ogul, Hasan [6 ]
机构
[1] Turkish Med & Med Devices Agcy, Ankara, Turkey
[2] SANKO Univ, Sch Med, Dept Med Biol, Gaziantep, Turkey
[3] SANKO Univ, Inst Grad Educ, Dept Biol & Biomed Sci, Gaziantep, Turkey
[4] SANKO Univ, Dept Med Biochem, Sch Med, Gaziantep, Turkey
[5] SANKO Univ, Grad Inst Educ, Dept Mol Med, Gaziantep, Turkey
[6] Ostfold Univ Coll, Fac Comp Sci, POB 700, N-1757 Halden, Norway
关键词
Machine-learning; Classification; Druggable gene; Tumor location; Colorectal cancer; Gene expression; SIDED COLON-CANCER; MICROARRAY DATA; MICROSATELLITE INSTABILITY; EXPRESSION SIGNATURE; CLASSIFICATION; IDENTIFICATION; METHYLATION; SUBSET; PROLIFERATION; METASTASIS;
D O I
10.1007/s11517-022-02641-w
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Numerous studies have been conducted to elucidate the relation of tumor proximity to cancer prognosis and treatment efficacy in colorectal cancer. However, the molecular pathways and prognoses of left- and right-sided colorectal cancers are different, and this difference has not been fully investigated at the genomic level. In this study, a set of data science approaches, including six feature selection methods and three classification models, were used in predicting tumor location from gene expression profiles. Specificity, sensitivity, accuracy, and Mathew's correlation coefficient (MCC) evaluation metrics were used to evaluate the classification ability. Gene ontology enrichment analysis was applied by the Gene Ontology PANTHER Classification System. For the most significant 50 genes, protein-protein interactions and drug-gene interactions were analyzed using the GeneMANIA, CytoScape, CytoHubba, MCODE, and DGIdb databases. The highest classification accuracy (90%) is achieved with the most significant 200 genes when the ensemble-decision tree classification model is used with the ReliefF feature selection method. Molecular pathways and drug interactions are investigated for the most significant 50 genes. It is concluded that a machine-learning-based approach could be useful to discover the significant genes that may have an important role in the development of new therapies and drugs for colorectal cancer.
引用
收藏
页码:2877 / 2897
页数:21
相关论文
共 21 条
  • [21] Data-driven prediction of prolonged air leak after video-assisted thoracoscopic surgery for lung cancer: Development and validation of machine-learning-based models using real-world data through the ePath system
    Tou, Saori
    Matsumoto, Koutarou
    Hashinokuchi, Asato
    Kinoshita, Fumihiko
    Nakaguma, Hideki
    Kozuma, Yukio
    Sugeta, Rui
    Nohara, Yasunobu
    Yamashita, Takanori
    Wakata, Yoshifumi
    Takenaka, Tomoyoshi
    Iwatani, Kazunori
    Soejima, Hidehisa
    Yoshizumi, Tomoharu
    Nakashima, Naoki
    Kamouchi, Masahiro
    LEARNING HEALTH SYSTEMS, 2024,