mvPPT: A Highly Efficient and Sensitive Pathogenicity Prediction Tool for Missense Variants

被引:4
作者
Tong, Shi-Yuan [1 ]
Fan, Ke [1 ]
Zhou, Zai-Wei [2 ]
Liu, Lin-Yun [1 ]
Zhang, Shu-Qing [1 ]
Fu, Yinghui [1 ]
Wang, Guang-Zhong [3 ]
Zhu, Ying [4 ]
Yu, Yong-Chun [1 ]
机构
[1] Fudan Univ, Jingan Dist Cent Hosp Shanghai, Inst Brain Sci, MOE Frontiers Ctr Brain Sci,State Key Lab Med Neur, Shanghai 200032, Peoples R China
[2] Shanghai Xunyin Biotechnol Co Ltd, Shanghai 201802, Peoples R China
[3] Univ Chinese Acad Sci, Chinese Acad Sci, Shanghai Inst Nutr & Hlth, CAS Key Lab Computat Biol, Shanghai 200031, Peoples R China
[4] Fudan Univ, MOE Frontiers Ctr Brain Sci, Inst Brain Sci, State Key Lab Med Neurobiol,Huashan Hosp, Shanghai 200032, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Machine learning; Missense variant; Genomics; Computational biology; Pathogenicity prediction; FUNCTIONAL IMPACT; DATABASE; MUTATIONS; DIAGNOSIS; ELEMENTS;
D O I
10.1016/j.gpb.2022.07.005
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation sequencing technologies both boost the discovery of variants in the human genome and exacerbate the challenges of pathogenic variant identification. In this study, we developed Pathogenicity Prediction Tool for missense variants (mvPPT), a highly sensitive and accurate missense variant classifier based on gradient boosting. mvPPT adopts high-confidence training sets with a wide spectrum of variant profiles, and extracts three categories of features, including scores from existing prediction tools, frequencies (allele frequencies, amino acid frequencies, and genotype frequencies), and genomic context. Compared with established predictors, mvPPT achieves superior performance in all test sets, regardless of data source. In addition, our study also provides guidance for training set and feature selection strategies, as well as reveals highly relevant features, which may further provide biological insights into variant pathogenicity. mvPPT is freely available at http://www.mvppt.club/.
引用
收藏
页码:414 / 426
页数:13
相关论文
共 50 条
  • [41] KVarPredDB: a database for predicting pathogenicity of missense sequence variants of keratin genes associated with genodermatoses
    Ying, Yuyi
    Lu, Lu
    Banerjee, Santasree
    Xu, Lizhen
    Zhao, Qiang
    Wu, Hao
    Li, Ruiqi
    Xu, Xiao
    Yu, Hua
    Neculai, Dante
    Xi, Yongmei
    Yang, Fan
    Qin, Jiale
    Li, Chen
    HUMAN GENOMICS, 2020, 14 (01)
  • [42] Evaluating the relevance of sequence conservation in the prediction of pathogenic missense variants
    Capriotti, Emidio
    Fariselli, Piero
    HUMAN GENETICS, 2022, 141 (10) : 1649 - 1658
  • [43] Enhancing missense variant pathogenicity prediction with protein language models using VariPred
    Lin, Weining
    Wells, Jude
    Wang, Zeyuan
    Orengo, Christine
    Martin, Andrew C. R.
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [44] Detection of Splicing Aberrations Caused by BRCA1 and BRCA2 Sequence Variants Encoding Missense Substitutions: Implications for Prediction of Pathogenicity
    Walker, Logan C.
    Whiley, Phillip J.
    Couch, Fergus J.
    Farrugia, Daniel J.
    Healey, Sue
    Eccles, Diana M.
    Lin, Feng
    Butler, Samantha A.
    Goff, Sheila A.
    Thompson, Bryony A.
    Lakhani, Sunil R.
    Da Silva, Leonard M.
    Tavtigian, Sean V.
    Goldgar, David E.
    Brown, Melissa A.
    Spurdle, Amanda B.
    HUMAN MUTATION, 2010, 31 (06) : E1484 - E1505
  • [45] SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing
    Danis, Daniel
    Jacobsen, Julius O. B.
    Balachandran, Parithi
    Zhu, Qihui
    Yilmaz, Feyza
    Reese, Justin
    Haimel, Matthias
    Lyon, Gholson J.
    Helbig, Ingo
    Mungall, Christopher J.
    Beck, Christine R.
    Lee, Charles
    Smedley, Damian
    Robinson, Peter N.
    GENOME MEDICINE, 2022, 14 (01)
  • [46] Response to "In silico prediction is insufficient to assess pathogenicity of mtDNA variants"
    Bacalhau, Mafalda
    Pratas, Joao
    Simoes, Marta
    Mendes, Candida
    Ribeiro, Carolina
    Santos, Maria J.
    Diogo, Luisa
    Macario, Maria Carmo
    Grazina, Manuela
    EUROPEAN JOURNAL OF MEDICAL GENETICS, 2018, 61 (01) : 46 - 47
  • [47] ClinPred: Prediction Tool to Identify Disease-Relevant Nonsynonymous Single-Nucleotide Variants
    Alirezaie, Najmeh
    Kernohan, Kristin D.
    Hartley, Taila
    Majewski, Jacek
    Hocking, Toby Dylan
    AMERICAN JOURNAL OF HUMAN GENETICS, 2018, 103 (04) : 474 - 483
  • [48] 3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints
    Won, Dhong-Gun
    Kim, Dong-Wook
    Woo, Junwoo
    Lee, Kyoungyeul
    BIOINFORMATICS, 2021, 37 (24) : 4626 - 4634
  • [49] Development of a novel prediction model based on protein structure for identifying RPE65-associated inherited retinal disease (IRDs) of missense variants
    Wu, Jiawen
    Sun, Zhongmou
    Zhang, Dao Wei
    Liu, Hong-Li
    Li, Ting
    Zhang, Shenghai
    Wu, Jihong
    PEERJ, 2023, 11
  • [50] A new disease-specific machine learning approach for the prediction of cancer-causing missense variants
    Capriotti, Emidio
    Altman, Russ B.
    GENOMICS, 2011, 98 (04) : 310 - 317