SVFX: a machine learning framework to quantify the pathogenicity of structural variants

被引:21
|
作者
Kumar, Sushant [1 ,2 ]
Harmanci, Arif [3 ]
Vytheeswaran, Jagath [4 ]
Gerstein, Mark B. [1 ,2 ,5 ]
机构
[1] Yale Univ, Program Computat Biol & Bioinformat, New Haven, CT 06520 USA
[2] Yale Univ, Dept Mol Biophys & Biochem, POB 6666, New Haven, CT 06520 USA
[3] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, Ctr Precis Hlth, Houston, TX 77030 USA
[4] CALTECH, Dept Comp & Math Sci, Pasadena, CA 91125 USA
[5] Yale Univ, Dept Comp Sci, 260-266 Whitney Ave,POB 208114, New Haven, CT 06520 USA
基金
美国国家卫生研究院;
关键词
IMPACT; SETD3; MUTATIONS;
D O I
10.1186/s13059-020-02178-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] A Unified Framework for Inattention Estimation From Resting State Phase Synchrony Using Machine Learning
    Wang, Xun-Heng
    Li, Lihua
    FRONTIERS IN GENETICS, 2021, 12
  • [42] Wind turbine fault detection and identification using a two-tier machine learning framework
    Allal, Zaid
    Noura, Hassan N.
    Vernier, Flavien
    Salman, Ola
    Chahine, Khaled
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 22
  • [43] A framework to assess the dynamics of climate extremes on irrigation water requirement using machine learning techniques
    Jaiswal, R. K.
    Lohani, A. K.
    JOURNAL OF EARTH SYSTEM SCIENCE, 2023, 132 (01)
  • [44] A Machine Learning Framework for Automatic and Continuous MMN Detection With Preliminary Results for Coma Outcome Prediction
    Armanfard, Narges
    Komeili, Majid
    Reilly, James P.
    Connolly, John F.
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2019, 23 (04) : 1794 - 1804
  • [45] An Unsupervised Machine Learning-Based Framework for Transferring Local Factories into Supply Chain Networks
    Mad Ali, Mohd Fahmi Bin
    Ariffin, Mohd Khairol Anuar Bin Mohd
    Bin Mustapha, Faizal
    Bin Supeni, Eris Elianddy
    MATHEMATICS, 2021, 9 (23)
  • [46] Species assignment from seal diet samples using shape analyses in a machine learning framework
    Mion, Monica
    Berg, Florian
    Saltalamacchia, Francesco
    Bartolino, Valerio
    Lovgren, Johan
    Nord, Mikaela Bergenius
    Gilljam, David
    Blass, Martina
    Lundstrom, Karl
    ICES JOURNAL OF MARINE SCIENCE, 2024, : 1952 - 1962
  • [47] Clinical, Structural, Biochemical and X-Ray Crystallographic Correlates of Pathogenicity for Variants in the C-Propeptide Region of the COL3A1 Gene
    Stembridge, Natasha S.
    Vandersteen, Anthony M.
    Ghali, Neeti
    Sawle, Philip
    Nesbitt, Mandy
    Pollitt, Rebecca C.
    Ferguson, David J. P.
    Holden, Simon
    Elmslie, Frances
    Henderson, Alex
    Hulmes, David J. S.
    Pope, F. Michael
    AMERICAN JOURNAL OF MEDICAL GENETICS PART A, 2015, 167 (08) : 1763 - 1772
  • [48] Beyond multidrug resistance: Leveraging rare variants with machine and statistical learning models in Mycobacterium tuberculosis resistance prediction
    Chen, Michael L.
    Doddi, Akshith
    Royer, Jimmy
    Freschi, Luca
    Schito, Marco
    Ezewudo, Matthew
    Kohane, Isaac S.
    Beam, Andrew
    Farhat, Maha
    EBIOMEDICINE, 2019, 43 : 356 - 369
  • [49] GENESIS: Gene-Specific Machine Learning Models for Variants of Uncertain Significance Found in Catecholaminergic Polymorphic Ventricular Tachycardia and Long QT Syndrome-Associated Genes
    Draelos, Rachel L.
    Ezekian, Jordan E.
    Zhuang, Farica
    Moya-Mendez, Mary E.
    Zhang, Zhushan
    Rosamilia, Michael B.
    Manivannan, Perathu K. R.
    Henao, Ricardo
    Landstrom, Andrew P.
    CIRCULATION-ARRHYTHMIA AND ELECTROPHYSIOLOGY, 2022, 15 (04) : 241 - 251
  • [50] Machine learning facilitated structural activity relationship approach for the discovery of novel inhibitors targeting EGFR
    Choudhary, Rekha
    Walhekar, Vinayak
    Muthal, Amol
    Kumar, Dilip
    Bagul, Chandrakant
    Kulkarni, Ravindra
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2023, 41 (22) : 12445 - 12463