Combining Mutation and Gene Network Data in a Machine Learning Approach for False-Positive Cancer Driver Gene Discovery

被引:3
|
作者
Cutigi, Jorge Francisco [1 ,2 ]
Evangelista, Renato Feijo [2 ]
Ramos, Rodrigo Henrique [1 ,2 ]
Lage Ferreira, Cynthia de Oliveira [2 ]
Evangelista, Adriane Feijo [3 ]
de Carvalho, Andre C. P. L. F. [2 ]
Simao, Adenilso [2 ]
机构
[1] Fed Inst Sao Paulo, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Sao Carlos, SP, Brazil
[3] Barretos Canc Hosp, Barretos, SP, Brazil
来源
ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2020 | 2020年 / 12558卷
关键词
Cancer bioinformatics; Driver genes; False-positive driver; Complex networks; Machine learning; PATHWAYS;
D O I
10.1007/978-3-030-65775-8_8
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An increasing interest in Cancer Genomics research emerged from the advent and widespread use of next-generation sequencing technologies, which have generated a large amount of digital biological data. However, not all of this information in fact contributes to cancer studies. For instance, false-positive-driver genes may contain characteristics of cancer genes but are not actually relevant to the cancer initiation and progression. Including this type of genes in cancer studies may lead to identifying unrealistic trends in the data and mislead biomedical decisions. Therefore, proper screening to detect this specific type of gene among genes considered drivers is of utmost importance. This work is focused on the development of models dedicated to this task. Support Vector Machine (SVM) and Random Forest (RF) machine learning algorithms were selected to induce predictive models to classify supposedly driver genes as real drivers or false-positive drivers based on both mutation data and gene network interactions. The results confirmed that the combination of the two sources of information improves the performance of the models. Moreover, SVM and RF models achieved a classification accuracy of 85.0% and 82.4% over labeled data, respectively. Finally, a literature-based analysis was performed over the classification of a new set of genes to further validate the concept.
引用
收藏
页码:81 / 92
页数:12
相关论文
共 50 条
  • [31] Cancer Classification of Gene Expression Data using Machine Learning Models
    De Guia, Joseph M.
    Devaraj, Madhavi
    Vea, Larry A.
    2018 IEEE 10TH INTERNATIONAL CONFERENCE ON HUMANOID, NANOTECHNOLOGY, INFORMATION TECHNOLOGY, COMMUNICATION AND CONTROL, ENVIRONMENT AND MANAGEMENT (HNICEM), 2018,
  • [32] Machine Learning Clustering for Cancer Analysis Employing Gene Expression Data
    Ospino, Camilo Andres Perez
    Rivera, Jorman Arbey Castro
    Orjuela-Canon, Alvaro D.
    2023 IEEE COLOMBIAN CONFERENCE ON APPLICATIONS OF COMPUTATIONAL INTELLIGENCE, COLCACI, 2023,
  • [33] Gene shaving using a sensitivity analysis of kernel based machine learning approach, with applications to cancer data
    Alam, Md. Ashad
    Shahjaman, Mohammd
    Rahman, Md. Ferdush
    Hossain, Fokhrul
    Deng, Hong-Wen
    PLOS ONE, 2019, 14 (05):
  • [34] Gene regulatory network discovery from time-series gene expression data - A computational intelligence approach
    Kasabov, NK
    Chan, ZSH
    Jain, V
    Sidorov, I
    Dimitrov, DS
    NEURAL INFORMATION PROCESSING, 2004, 3316 : 1344 - 1353
  • [35] Machine Learning-Driven Discovery of Quadruple-Negative Breast Cancer Subtypes from Gene Expression Data
    Sahoo, Bikram
    Jinna, Nikita
    Rida, Padmashree
    Pinnix, Zandra
    Zelikovsky, Alex
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 182 - 195
  • [36] GenePlexus: a web-server for gene discovery using network-based machine learning
    Mancuso, Christopher A.
    Bills, Patrick S.
    Krum, Douglas
    Newsted, Jacob
    Liu, Renming
    Krishnan, Arjun
    NUCLEIC ACIDS RESEARCH, 2022, 50 (W1) : W358 - W366
  • [37] PyGenePlexus: a Python']Python package for gene discovery using network-based machine learning
    Mancuso, Christopher A.
    Liu, Renming
    Krishnan, Arjun
    BIOINFORMATICS, 2023, 39 (02)
  • [38] Bioinformatics Prediction and Machine Learning on Gene Expression Data Identifies Novel Gene Candidates in Gastric Cancer
    Kori, Medi
    Gov, Esra
    GENES, 2022, 13 (12)
  • [39] Cancer driver gene discovery through an integrative genomics approach in a non-parametric Bayesian framework
    Yang, Hai
    Wei, Qiang
    Zhong, Xue
    Yang, Hushan
    Li, Bingshan
    BIOINFORMATICS, 2017, 33 (04) : 483 - 490
  • [40] A Machine Learning Approach for Identifying Gene Biomarkers Guiding the Treatment of Breast Cancer
    Abou Tabl, Ashraf
    Alkhateeb, Abedalrhman
    ElMaraghy, Waguih
    Rueda, Luis
    Ngom, Alioune
    FRONTIERS IN GENETICS, 2019, 10