Improved modeling of RNA-binding protein motifs in an interpretable neural model of RNA splicing

被引:0
作者
Kavi Gupta
Chenxi Yang
Kayla McCue
Osbert Bastani
Phillip A. Sharp
Christopher B. Burge
Armando Solar-Lezama
机构
[1] Massachusetts Institute of Technology,Department of Electrical Engineering and Computer Science
[2] University of Texas at Austin,Department of Computer Science
[3] Massachusetts Institute of Technology,Department of Biology
[4] University of Pennsylvania,Department of Computer and Information Science
[5] Koch Institute of Integrative Cancer Research,undefined
[6] Massachusetts Institute of Technology,undefined
来源
Genome Biology | / 25卷
关键词
Alternative splicing; Genome interpretation; Machine learning; Neural network; RNA processing; RNA-binding protein; Variant interpretation;
D O I
暂无
中图分类号
学科分类号
摘要
Sequence-specific RNA-binding proteins (RBPs) play central roles in splicing decisions. Here, we describe a modular splicing architecture that leverages in vitro-derived RNA affinity models for 79 human RBPs and the annotated human genome to produce improved models of RBP binding and activity. Binding and activity are modeled by separate Motif and Aggregator components that can be mixed and matched, enforcing sparsity to improve interpretability. Training a new Adjusted Motif (AM) architecture on the splicing task not only yields better splicing predictions but also improves prediction of RBP-binding sites in vivo and of splicing activity, assessed using independent data.
引用
收藏
相关论文
共 170 条
[1]  
Lee Y(2015)Mechanisms and regulation of alternative pre-mRNA splicing Annu Rev Biochem. 84 291-323
[2]  
Rio DC(1997)Prediction of complete gene structures in human genomic DNA J Mol Biol. 268 78-94
[3]  
Burge C(1987)RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression Nucleic Acids Res. 15 7155-7174
[4]  
Karlin S(2001)GeneSplicer: a new computational method for splice site prediction Nucleic Acids Res. 29 1185-1190
[5]  
Shapiro MB(2004)Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals J Comput Biol : J Comput Mol Cell Biol. 11 377-394
[6]  
Senapathy P(2001)A computational analysis of sequence features involved in recognition of short introns Proc Natl Acad Sci U S A. 98 11193-11198
[7]  
Pertea M(2004)Systematic identification and analysis of exonic splicing silencers Cell. 119 831-845
[8]  
Lin X(2010)Deciphering the splicing code Nature. 465 53-59
[9]  
Salzberg SL(2011)Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context Bioinformatics. 27 2554-2562
[10]  
Yeo G(2014)Deep learning of the tissue-regulated splicing code Bioinformatics. 30 i121-i129