Machine Learning Approaches for the Prioritization of Genomic Variants Impacting Pre-mRNA Splicing

被引:34
作者
Rowlands, Charlie F. [1 ,2 ]
Baralle, Diana [3 ]
Ellingford, Jamie M. [1 ,2 ]
机构
[1] Manchester Univ Hosp NHS Fdn Trust, St Marys Hosp, Manchester Ctr Genom Med, North West Genom Lab Hub, Manchester M13 9WJ, Lancs, England
[2] Univ Manchester, Fac Biol Med & Hlth, Sch Biol Sci, Div Evolut & Genom Sci, Manchester M13 9PR, Lancs, England
[3] Univ Southampton, Fac Med, Human Dev & Hlth, MP808,Tremona Rd, Southampton SO16 6YD, Hants, England
基金
英国医学研究理事会;
关键词
Mendelian disease; diagnostics; variant interpretation; variant prioritization; RNA splicing; bioinformatics; machine learning; genomic medicine; effect prediction; GENE MUTATION DATABASE; POLYPYRIMIDINE TRACT; ACCEPTOR-SITE; IN-VIVO; IDENTIFICATION; EXON; RARE; EXPRESSION; DIAGNOSIS; ELEMENTS;
D O I
10.3390/cells8121513
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
Defects in pre-mRNA splicing are frequently a cause of Mendelian disease. Despite the advent of next-generation sequencing, allowing a deeper insight into a patient's variant landscape, the ability to characterize variants causing splicing defects has not progressed with the same speed. To address this, recent years have seen a sharp spike in the number of splice prediction tools leveraging machine learning approaches, leaving clinical geneticists with a plethora of choices for in silico analysis. In this review, some basic principles of machine learning are introduced in the context of genomics and splicing analysis. A critical comparative approach is then used to describe seven recent machine learning-based splice prediction tools, revealing highly diverse approaches and common caveats. We find that, although great progress has been made in producing specific and sensitive tools, there is still much scope for personalized approaches to prediction of variant impact on splicing. Such approaches may increase diagnostic yields and underpin improvements to patient care.
引用
收藏
页数:22
相关论文
共 100 条
[61]   The impact of alternative splicing in vivo:: Mouse models show the way [J].
Moroy, Tarik ;
Heyd, Florian .
RNA, 2007, 13 (08) :1155-1171
[62]   SIFT: predicting amino acid changes that affect protein function [J].
Ng, PC ;
Henikoff, S .
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) :3812-3814
[63]   GeneSplicer: a new computational method for splice site prediction [J].
Pertea, M ;
Lin, XY ;
Salzberg, SL .
NUCLEIC ACIDS RESEARCH, 2001, 29 (05) :1185-1190
[64]   Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes [J].
Petrovski, Slave ;
Wang, Quanli ;
Heinzen, Erin L. ;
Allen, Andrew S. ;
Goldstein, David B. .
PLOS GENETICS, 2013, 9 (08)
[65]   Detection of nonneutral substitution rates on mammalian phylogenies [J].
Pollard, Katherine S. ;
Hubisz, Melissa J. ;
Rosenbloom, Kate R. ;
Siepel, Adam .
GENOME RESEARCH, 2010, 20 (01) :110-121
[66]   HS3D, a dataset of Homo Sapiens Splice regions, and its extraction procedure from a major public database [J].
Pollastro, P ;
Rampone, S .
INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2002, 13 (08) :1105-1117
[67]   NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins [J].
Pruitt, Kim D. ;
Tatusova, Tatiana ;
Maglott, Donna R. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D61-D65
[68]   Inferring social network structure in ecological systems from spatio-temporal data streams [J].
Psorakis, Ioannis ;
Roberts, Stephen J. ;
Rezek, Iead ;
Sheldon, Ben C. .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2012, 9 (76) :3055-3066
[69]   Improved splice site detection in Genie [J].
Reese, MG ;
Eeckman, FH ;
Kulp, D ;
Haussler, D .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (03) :311-323
[70]   CADD: predicting the deleteriousness of variants throughout the human genome [J].
Rentzsch, Philipp ;
Witten, Daniela ;
Cooper, Gregory M. ;
Shendure, Jay ;
Kircher, Martin .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D886-D894