CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations

被引:0
作者
Wang, Lihua [1 ,2 ]
Sun, Haiyang [3 ]
Yue, Zhenyu [4 ]
Xia, Junfeng [1 ]
Li, Xiaoyan [1 ]
机构
[1] Anhui Univ, Inst Phys Sci & Informat Technol, Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei, Peoples R China
[2] HuangShan Univ, Sch Informat Engn, Huangshan, Anhui, Peoples R China
[3] Nankai Univ, State Key Lab Med Chem Biol, Tianjin, Peoples R China
[4] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China
来源
PEERJ | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Cancer; Machine learning; Driver missense mutation prediction; Benchmark quality; XGBoost; SYNONYMOUS VARIANTS; PATHOGENICITY; IDENTIFICATION; IMPACT; LUNG;
D O I
10.7717/peerj.17991
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred's prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Computational Prediction of Driver Missense Mutations in Melanoma
    Sun, Haiyang
    Yue, Zhenyu
    Zhao, Le
    Xia, Junfeng
    Bin, Yannan
    Zhang, Di
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT II, 2018, 10955 : 438 - 447
  • [2] Predicting Oncogenic Missense Mutations
    Lei, Xue
    Wang, Boshen
    Perez-Rathke, Alan
    Tian, Wei
    Chou, Chia-Yi
    Tseng, Yan-Yuan
    Liang, Jie
    2019 IEEE EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL & HEALTH INFORMATICS (BHI), 2019,
  • [3] Discrimination of driver and passenger mutations in epidermal growth factor receptor in cancer
    Anoosha, P.
    Huang, Liang-Tsung
    Sakthivel, R.
    Karunagaran, D.
    Gromiha, M. Michael
    MUTATION RESEARCH-FUNDAMENTAL AND MOLECULAR MECHANISMS OF MUTAGENESIS, 2015, 780 : 24 - 34
  • [4] Using passenger mutations to estimate the timing of driver mutations and identify mutator alterations
    Youn, Ahrim
    Simon, Richard
    BMC BIOINFORMATICS, 2013, 14
  • [5] Comprehensive assessment of computational algorithms in predicting cancer driver mutations
    Chen, Hu
    Li, Jun
    Wang, Yumeng
    Ng, Patrick Kwok-Shing
    Tsang, Yiu Huen
    Shaw, Kenna R.
    Mills, Gordon B.
    Liang, Han
    GENOME BIOLOGY, 2020, 21 (01)
  • [6] CScape-somatic: distinguishing driver and passenger point mutations in the cancer genome
    Rogers, Mark F.
    Gaunt, Tom R.
    Campbell, Colin
    BIOINFORMATICS, 2020, 36 (12) : 3637 - 3644
  • [7] Accumulation of driver and passenger mutations during tumor progression
    Bozic, Ivana
    Antal, Tibor
    Ohtsuki, Hisashi
    Carter, Hannah
    Kim, Dewey
    Chen, Sining
    Karchin, Rachel
    Kinzler, Kenneth W.
    Bogelstein, Bert
    Nowak, Martin A.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (43) : 18545 - 18550
  • [8] Evolutionary triage governs fitness in driver and passenger mutations and suggests targeting never mutations
    Gatenby, R. A.
    Cunningham, J. J.
    Brown, J. S.
    NATURE COMMUNICATIONS, 2014, 5
  • [9] Driver mutations of cancer epigenomes
    Roy, David M.
    Walsh, Logan A.
    Chan, Timothy A.
    PROTEIN & CELL, 2014, 5 (04) : 265 - 296
  • [10] Distinguishing between driver and passenger mutations in individual cancer genomes by network enrichment analysis
    Merid, Simon Kebede
    Goranskaya, Daria
    Alexeyenko, Andrey
    BMC BIOINFORMATICS, 2014, 15