CDMPred: a tool for predicting cancer driver missense mutations with high-quality passenger mutations

被引:0
作者
Wang, Lihua [1 ,2 ]
Sun, Haiyang [3 ]
Yue, Zhenyu [4 ]
Xia, Junfeng [1 ]
Li, Xiaoyan [1 ]
机构
[1] Anhui Univ, Inst Phys Sci & Informat Technol, Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei, Peoples R China
[2] HuangShan Univ, Sch Informat Engn, Huangshan, Anhui, Peoples R China
[3] Nankai Univ, State Key Lab Med Chem Biol, Tianjin, Peoples R China
[4] Anhui Agr Univ, Sch Informat & Artificial Intelligence, Hefei 230036, Anhui, Peoples R China
来源
PEERJ | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Cancer; Machine learning; Driver missense mutation prediction; Benchmark quality; XGBoost; SYNONYMOUS VARIANTS; PATHOGENICITY; IDENTIFICATION; IMPACT; LUNG;
D O I
10.7717/peerj.17991
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Most computational methods for predicting driver mutations have been trained using positive samples, while negative samples are typically derived from statistical methods or putative samples. The representativeness of these negative samples in capturing the diversity of passenger mutations remains to be determined. To tackle these issues, we curated a balanced dataset comprising driver mutations sourced from the COSMIC database and high-quality passenger mutations obtained from the Cancer Passenger Mutation database. Subsequently, we encoded the distinctive features of these mutations. Utilizing feature correlation analysis, we developed a cancer driver missense mutation predictor called CDMPred employing feature selection through the ensemble learning technique XGBoost. The proposed CDMPred method, utilizing the top 10 features and XGBoost, achieved an area under the receiver operating characteristic curve (AUC) value of 0.83 and 0.80 on the training and independent test sets, respectively. Furthermore, CDMPred demonstrated superior performance compared to existing state-of-the-art methods for cancer-specific and general diseases, as measured by AUC and area under the precision-recall curve. Including high-quality passenger mutations in the training data proves advantageous for CDMPred's prediction performance. We anticipate that CDMPred will be a valuable tool for predicting cancer driver mutations, furthering our understanding of personalized therapy.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] No evidence that HLA genotype influences the driver mutations that occur in cancer patients
    Noor Kherreh
    Siobhán Cleary
    Cathal Seoighe
    Cancer Immunology, Immunotherapy, 2022, 71 : 819 - 827
  • [42] Tumour driver mutations compromise between cancer growth and immune responses
    Greenbaum, Benjamin D.
    Hoyos, David
    Thomas, Paul
    NATURE, 2022,
  • [43] DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer
    Bashashati, Ali
    Haffari, Gholamreza
    Ding, Jiarui
    Ha, Gavin
    Lui, Kenneth
    Rosner, Jamie
    Huntsman, David G.
    Caldas, Carlos
    Aparicio, Samuel A.
    Shah, Sohrab P.
    GENOME BIOLOGY, 2012, 13 (12): : R124
  • [44] The effects of mutational processes and selection on driver mutations across cancer types
    Temko, Daniel
    Tomlinson, Ian P. M.
    Severini, Simone
    Schuster-Bockler, Benjamin
    Graham, Trevor A.
    NATURE COMMUNICATIONS, 2018, 9
  • [45] Unsupervised detection of cancer driver mutations with parsimony-guided learning
    Kumar, Runjun D.
    Swamidass, S. Joshua
    Bose, Ron
    NATURE GENETICS, 2016, 48 (10) : 1288 - 1295
  • [46] No evidence that HLA genotype influences the driver mutations that occur in cancer patients
    Kherreh, Noor
    Cleary, Siobhan
    Seoighe, Cathal
    CANCER IMMUNOLOGY IMMUNOTHERAPY, 2022, 71 (04) : 819 - 827
  • [47] Finding driver mutations in cancer: Elucidating the role of background mutational processes
    Brown, Anna-Leigh
    Li, Minghui
    Goncearenco, Alexander
    Panchenko, Anna R.
    PLOS COMPUTATIONAL BIOLOGY, 2019, 15 (04)
  • [48] Identification of Single Nucleotide Non-coding Driver Mutations in Cancer
    Gan, Kok A.
    Pro, Sebastian Carrasco
    Sewell, Jared A.
    Bass, Juan I. Fuxman
    FRONTIERS IN GENETICS, 2018, 9
  • [49] Predicting the functional impact of protein mutations: application to cancer genomics
    Reva, Boris
    Antipin, Yevgeniy
    Sander, Chris
    NUCLEIC ACIDS RESEARCH, 2011, 39 (17) : E118 - U85
  • [50] Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset
    Zheng, Feifan
    Liu, Yang
    Yang, Yan
    Wen, Yuhao
    Li, Minghui
    PROTEIN SCIENCE, 2024, 33 (01)