CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations

被引:0
作者
Wang, Lihua [1 ,2 ]
Sun, Haiyang [3 ]
Yue, Zhenyu [4 ]
Xia, Junfeng [1 ]
Li, Xiaoyan [1 ]
机构
[1] Anhui Univ, Inst Phys Sci & Informat Technol, Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei, Peoples R China
[2] HuangShan Univ, Sch Informat Engn, Huangshan, Anhui, Peoples R China
[3] Nankai Univ, State Key Lab Med Chem Biol, Tianjin, Peoples R China
[4] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China
来源
PEERJ | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Cancer; Machine learning; Driver missense mutation prediction; Benchmark quality; XGBoost; SYNONYMOUS VARIANTS; PATHOGENICITY; IDENTIFICATION; IMPACT; LUNG;
D O I
10.7717/peerj.17991
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred's prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] Cancer-mutation network and the number and specificity of driver mutations
    Iranzo, Jaime
    Martincorena, Inigo
    Koonin, Eugene, V
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (26) : E6010 - E6019
  • [32] Computational and molecular approaches for predicting unreported causal missense mutations in Belgian patients with haemophilia A
    Lannoy, N.
    Abinet, I.
    Bosmans, A.
    Lambert, C.
    Vermylen, C.
    Hermans, C.
    HAEMOPHILIA, 2012, 18 (03) : e331 - e339
  • [33] In-silico analysis to identify the role of MEN1 missense mutations in breast cancer
    Ganakammal, Satishkumar Ranganathan
    Koirala, Mahesh
    Wu, Bohua
    Alexov, Emil
    JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY, 2020, 19 (06)
  • [34] The search for cis-regulatory driver mutations in cancer genomes
    Poulos, Rebecca C.
    Sloane, Mathew A.
    Hesson, Luke B.
    Wong, Jason W. H.
    ONCOTARGET, 2015, 6 (32) : 32509 - 32525
  • [35] AI-Driver: an ensemble method for identifying driver mutations in personal cancer genomes
    Wang, Haoxuan
    Wang, Tao
    Zhao, Xiaolu
    Wu, Honghu
    You, Mingcong
    Sun, Zhongsheng
    Mao, Fengbiao
    NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (04)
  • [36] Somatic mutation profiles in primary colorectal cancers and matching ovarian metastases: Identification of driver and passenger mutations
    Crobach, Stijn
    Ruano, Dina
    van Eijk, Ronald
    Schrumpf, Melanie
    Fleuren, Gertjan
    van Wezel, Tom
    Morreau, Hans
    JOURNAL OF PATHOLOGY CLINICAL RESEARCH, 2016, 2 (03): : 166 - 174
  • [37] Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease
    Serghini, Adam
    Portelli, Stephanie
    Troadec, Guillaume
    Song, Catherine
    Pan, Qisheng
    Pires, Douglas E., V
    Ascher, David B.
    HUMAN MOLECULAR GENETICS, 2024, 33 (03) : 224 - 232
  • [38] Parallel functional annotation of cancer-associated missense mutations in histone methyltransferases
    Canning, Ashley J.
    Viggiano, Susan
    Fernandez-Zapico, Martin E.
    Cosgrove, Michael S.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [39] Characterization of missense mutations in the tumor suppressor gene Cyld that are associated with endometrial cancer
    Kyrizaki, Paraskevi
    Katsetsiadis, Alkis Apostolos
    Mosialos, George
    JOURNAL OF BIOLOGICAL RESEARCH-THESSALONIKI, 2024, 31
  • [40] DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer
    Ali Bashashati
    Gholamreza Haffari
    Jiarui Ding
    Gavin Ha
    Kenneth Lui
    Jamie Rosner
    David G Huntsman
    Carlos Caldas
    Samuel A Aparicio
    Sohrab P Shah
    Genome Biology, 13