Interpretable prioritization of splice variants in diagnostic next-generation sequencing

被引:39
作者
Danis, Daniel [1 ]
Jacobsen, Julius O. B. [2 ]
Carmody, Leigh C. [1 ]
Gargano, Michael A. [1 ]
McMurry, Julie A. [3 ]
Hegde, Ayushi [1 ]
Haendel, Melissa A. [3 ]
Valentini, Giorgio [4 ,5 ,6 ]
Smedley, Damian [2 ]
Robinson, Peter N. [1 ,7 ]
机构
[1] Jackson Lab Genom Med, 10 Discovery Dr, Farmington, CT 06032 USA
[2] Queen Mary Univ London, Barts & London Sch Med & Dent Queen, William Harvey Res Inst, Charterhouse Sq, London EC1M 6BQ, England
[3] Univ Colorado, Anschutz Med Campus, Aurora, CO USA
[4] Univ Milan, Anacleto Lab, Dipartimento Informat, Via Celoria 18, I-20133 Milan, Italy
[5] Univ Milan, DSRC, Via Celoria 18, I-20133 Milan, Italy
[6] CINI Natl Lab Artificial Intelligence & Intellige, Rome, Italy
[7] Univ Connecticut, Inst Syst Genom, Farmington, CT 06032 USA
基金
欧盟地平线“2020”;
关键词
DEFECTS; GENE; ASSOCIATION; GUIDELINES; MUTATIONS;
D O I
10.1016/j.ajhg.2021.06.014
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 50 and 30 ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.
引用
收藏
页码:1564 / 1577
页数:14
相关论文
共 54 条
[1]  
Adzhubei Ivan, 2013, Curr Protoc Hum Genet, VChapter 7, DOI 10.1002/0471142905.hg0720s76
[2]   Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1 [J].
Ars, E ;
Serra, E ;
García, J ;
Kruyer, H ;
Gaona, A ;
Lázaro, C ;
Estivill, X .
HUMAN MOLECULAR GENETICS, 2000, 9 (02) :237-247
[3]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[4]  
Breiman L., 2001, Mach. Learn., V45, P5
[5]  
Caminsky Natasha, 2014, F1000Res, V3, P282, DOI 10.12688/f1000research.5654.1
[6]   NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans [J].
Caron, Barthelemy ;
Luo, Yufei ;
Rausell, Antonio .
GENOME BIOLOGY, 2019, 20 (1)
[7]   Characterization of splice-altering mutations in inherited predisposition to cancer [J].
Casadei, Silvia ;
Gulsuner, Suleyman ;
Shirts, Brian H. ;
Mandell, Jessica B. ;
Kortbawi, Hannah M. ;
Norquist, Barbara S. ;
Swisher, Elizabeth M. ;
Lee, Ming K. ;
Goldberg, Yael ;
O'Connor, Robert ;
Tan, Zheng ;
Pritchard, Colin C. ;
King, Mary-Claire ;
Walsh, Tom .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (52) :26798-26807
[8]  
Çelebi JT, 2000, HUM GENET, V107, P234
[9]   Improving genetic diagnosis in Mendelian disease with transcriptome sequencing [J].
Cummings, Beryl B. ;
Marshall, Jamie L. ;
Tukiainen, Taru ;
Lek, Monkol ;
Donkervoort, Sandra ;
Foley, A. Reghan ;
Bolduc, Veronique ;
Waddell, Leigh B. ;
Sandaradura, Sarah A. ;
O'Grady, Gina L. ;
Estrella, Elicia ;
Reddy, Hemakumar M. ;
Zhao, Fengmei ;
Weisburd, Ben ;
Karczewski, Konrad J. ;
O'Donnell-Luria, Anne H. ;
Birnbaum, Daniel ;
Sarkozy, Anna ;
Hu, Ying ;
Gonorazky, Hernan ;
Claeys, Kristl ;
Joshi, Himanshu ;
Bournazos, Adam ;
Oates, Emily C. ;
Ghaoui, Roula ;
Davis, Mark R. ;
Laing, Nigel G. ;
Topf, Ana ;
Kang, Peter B. ;
Beggs, Alan H. ;
North, Kathryn N. ;
Straub, Volker ;
Dowling, James J. ;
Muntoni, Francesco ;
Clarke, Nigel F. ;
Cooper, Sandra T. ;
Bonnemann, Carsten G. ;
MacArthur, Daniel G. .
SCIENCE TRANSLATIONAL MEDICINE, 2017, 9 (386)
[10]   Spliceman2: a computational web server that predicts defects in pre-mRNA splicing [J].
Cygan, Kamil Jan ;
Sanford, Clayton Hendrick ;
Fairbrother, William Guy .
BIOINFORMATICS, 2017, 33 (18) :2943-2945