SVFX: a machine learning framework to quantify the pathogenicity of structural variants

被引:21
|
作者
Kumar, Sushant [1 ,2 ]
Harmanci, Arif [3 ]
Vytheeswaran, Jagath [4 ]
Gerstein, Mark B. [1 ,2 ,5 ]
机构
[1] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[2] Yale Univ, Dept Mol Biophys & Biochem, POB 6666, New Haven, CT 06520 USA
[3] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, Ctr Precis Hlth, Houston, TX 77030 USA
[4] CALTECH, Dept Comp & Math Sci, Pasadena, CA 91125 USA
[5] Yale Univ, Dept Comp Sci, 260-266 Whitney Ave,POB 208114, New Haven, CT 06520 USA
基金
美国国家卫生研究院;
关键词
IMPACT; SETD3; MUTATIONS;
D O I
10.1186/s13059-020-02178-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] LYRUS: a machine learning model for predicting the pathogenicity of missense variants
    Lai, Jiaying
    Yang, Jordan
    Gamsiz Uzun, Ece D.
    Rubenstein, Brenda M.
    Sarkar, Indra Neil
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [2] Pathogenicity Prediction of Single Amino Acid Variants With Machine Learning Model Based on Protein Structural Energies
    Wu, Tzu-Hsuan
    Lin, Peng-Chan
    Chou, Hsin-Hung
    Shen, Meng-Ru
    Hsieh, Sun-Yuan
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (01) : 606 - 615
  • [3] DTreePred: an online viewer based on machine learning for pathogenicity prediction of genomic variants
    Gomes, Daniel Henrique Ferreira
    Medeiros, Inacio Gomes
    Petta, Tirzah Braz
    Stransky, Beatriz
    de Souza, Jorge Estefano Santana
    BMC BIOINFORMATICS, 2025, 26 (01): : 101
  • [4] StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants
    Sharo, Andrew G.
    Hu, Zhiqiang
    Sunyaev, Shamil R.
    Brenner, Steven E.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2022, 109 (02) : 195 - 209
  • [5] SVPath: an accurate pipeline for predicting the pathogenicity of human exon structural variants
    Yang, Yaning
    Wang, Xiaoqi
    Zhou, Deshan
    Wei, Dong-Qing
    Peng, Shaoliang
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [6] Gene-specific machine learning model to predict the pathogenicity of BRCA2 variants
    Khandakji, Mohannad N. N.
    Mifsud, Borbala
    FRONTIERS IN GENETICS, 2022, 13
  • [7] MVP predicts the pathogenicity of missense variants by deep learning
    Qi, Hongjian
    Zhang, Haicang
    Zhao, Yige
    Chen, Chen
    Long, John J.
    Chung, Wendy K.
    Guan, Yongtao
    Shen, Yufeng
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [8] Finding small somatic structural variants in exome sequencing data: a machine learning approach
    Kuhn, Matthias
    Stange, Thoralf
    Herold, Sylvia
    Thiede, Christian
    Roeder, Ingo
    COMPUTATIONAL STATISTICS, 2018, 33 (03) : 1145 - 1158
  • [9] SIGMA leverages protein structural information to predict the pathogenicity of missense variants
    Zhao, Hengqiang
    Du, Huakang
    Zhao, Sen
    Chen, Zefu
    Li, Yaqi
    Xu, Kexin
    Liu, Bowen
    Cheng, Xi
    Wen, Wen
    Li, Guozhuang
    Chen, Guilin
    Zhao, Zhengye
    Qiu, Guixing
    Liu, Pengfei
    Zhang, Terry Jianguo
    Wu, Zhihong
    Wu, Nan
    CELL REPORTS METHODS, 2024, 4 (01):
  • [10] 3Cnet: pathogenicity prediction of human variants using multitask learning with evolutionary constraints
    Won, Dhong-Gun
    Kim, Dong-Wook
    Woo, Junwoo
    Lee, Kyoungyeul
    BIOINFORMATICS, 2021, 37 (24) : 4626 - 4634