CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations

被引:0
作者
Wang, Lihua [1 ,2 ]
Sun, Haiyang [3 ]
Yue, Zhenyu [4 ]
Xia, Junfeng [1 ]
Li, Xiaoyan [1 ]
机构
[1] Anhui Univ, Inst Phys Sci & Informat Technol, Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei, Peoples R China
[2] HuangShan Univ, Sch Informat Engn, Huangshan, Anhui, Peoples R China
[3] Nankai Univ, State Key Lab Med Chem Biol, Tianjin, Peoples R China
[4] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China
来源
PEERJ | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Cancer; Machine learning; Driver missense mutation prediction; Benchmark quality; XGBoost; SYNONYMOUS VARIANTS; PATHOGENICITY; IDENTIFICATION; IMPACT; LUNG;
D O I
10.7717/peerj.17991
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred's prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] PmmNDD: Predicting the Pathogenicity of Missense Mutations in Neurodegenerative Diseases via Ensemble Learning
    Li, Xijian
    Huang, Ying
    Tang, Runxuan
    Xiao, Guangcheng
    Chen, Xiaochuan
    He, Ruilin
    Zhang, Zhaolei
    Luo, Jiana
    Wei, Yanjie
    Mao, Yijun
    Zhang, Huiling
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT III, ISBRA 2024, 2024, 14956 : 64 - 75
  • [22] An Evolutionary Approach for Identifying Driver Mutations in Colorectal Cancer
    Foo, Jasmine
    Liu, Lin L.
    Leder, Kevin
    Riester, Markus
    Iwasa, Yoh
    Lengauer, Christoph
    Michor, Franziska
    PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (09)
  • [23] dbCPM: a manually curated database for exploring the cancer passenger mutations
    Yue, Zhenyu
    Zhao, Le
    Xia, Junfeng
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (01) : 309 - 317
  • [24] Identifying the Molecular Drivers of Pathogenic Aldehyde Dehydrogenase Missense Mutations in Cancer and Non-Cancer Diseases
    Jessen-Howard, Dana
    Pan, Qisheng
    Ascher, David B.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2023, 24 (12)
  • [25] Understanding oncogenicity of cancer driver genes and mutations in the cancer genomics era
    Porta-Pardo, Eduard
    Valencia, Alfonso
    Godzik, Adam
    FEBS LETTERS, 2020, 594 (24) : 4233 - 4246
  • [26] Structural and Functional Impact of Cancer-Related Missense Somatic Mutations
    Shi, Zhen
    Moult, John
    JOURNAL OF MOLECULAR BIOLOGY, 2011, 413 (02) : 495 - 512
  • [27] Cancer Missense Mutations Alter Binding Properties of Proteins and Their Interaction Networks
    Nishi, Hafumi
    Tyagi, Manoj
    Teng, Shaolei
    Shoemaker, Benjamin A.
    Hashimoto, Kosuke
    Alexov, Emil
    Wuchty, Stefan
    Panchenko, Anna R.
    PLOS ONE, 2013, 8 (06):
  • [28] Finding cancer driver mutations in the era of big data research
    Poulos R.C.
    Wong J.W.H.
    Biophysical Reviews, 2019, 11 (1) : 21 - 29
  • [29] Drugging multiple same-allele driver mutations in cancer
    Nussinov, Ruth
    Zhang, Mingzhen
    Maloney, Ryan
    Jang, Hyunbum
    EXPERT OPINION ON DRUG DISCOVERY, 2021, 16 (08) : 823 - 828
  • [30] Identifying Driver Genes Mutations with Clinical Significance in Thyroid Cancer
    Yu, Hyeong Won
    Afzal, Muhammad
    Hussain, Maqbool
    Kwon, Hyungju
    Park, Young Joo
    Choi, June Young
    Lee, Kyu Eun
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 67 (01): : 1241 - 1251