rMATS-turbo: an efficient and flexible computational tool for alternative splicing analysis of large-scale RNA-seq data

被引:52
作者
Wang, Yuanyuan [1 ,2 ]
Xie, Zhijie [2 ]
Kutschera, Eric [2 ]
Adams, Jenea I. [2 ,3 ]
Kadash-Edmondson, Kathryn E. [2 ]
Xing, Yi [2 ,4 ,5 ]
机构
[1] Univ Calif Los Angeles, Bioinformat Interdept Grad Program, Los Angeles, CA USA
[2] Childrens Hosp Philadelphia, Ctr Computat & Genom Med, Philadelphia, PA 19104 USA
[3] Univ Penn, Genom & Computat Biol Grad Program, Philadelphia, PA USA
[4] Univ Penn, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[5] Childrens Hosp Philadelphia, Dept Biomed & Hlth Informat, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
QUANTIFICATION; TRANSCRIPTOME; LANDSCAPE; BINDING;
D O I
10.1038/s41596-023-00944-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Pre-mRNA alternative splicing is a prevalent mechanism for diversifying eukaryotic transcriptomes and proteomes. Regulated alternative splicing plays a role in many biological processes, and dysregulated alternative splicing is a feature of many human diseases. Short-read RNA sequencing (RNA-seq) is now the standard approach for transcriptome-wide analysis of alternative splicing. Since 2011, our laboratory has developed and maintained Replicate Multivariate Analysis of Transcript Splicing (rMATS), a computational tool for discovering and quantifying alternative splicing events from RNA-seq data. Here we provide a protocol for the contemporary version of rMATS, rMATS-turbo, a fast and scalable re-implementation that maintains the statistical framework and user interface of the original rMATS software, while incorporating a revamped computational workflow with a substantial improvement in speed and data storage efficiency. The rMATS-turbo software scales up to massive RNA-seq datasets with tens of thousands of samples. To illustrate the utility of rMATS-turbo, we describe two representative application scenarios. First, we describe a broadly applicable two-group comparison to identify differential alternative splicing events between two sample groups, including both annotated and novel alternative splicing events. Second, we describe a quantitative analysis of alternative splicing in a large-scale RNA-seq dataset (1,000 samples), including the discovery of alternative splicing events associated with distinct cell states. We detail the workflow and features of rMATS-turbo that enable efficient parallel processing and analysis of large-scale RNA-seq datasets on a compute cluster. We anticipate that this protocol will help the broad user base of rMATS-turbo make the best use of this software for studying alternative splicing in diverse biological systems.
引用
收藏
页码:1083 / 1104
页数:25
相关论文
共 63 条
[1]   The GTEx Consortium atlas of genetic regulatory effects across human tissues [J].
Aguet, Francois ;
Barbeira, Alvaro N. ;
Bonazzola, Rodrigo ;
Brown, Andrew ;
Castel, Stephane E. ;
Jo, Brian ;
Kasela, Silva ;
Kim-Hellmuth, Sarah ;
Liang, Yanyu ;
Parsana, Princy ;
Flynn, Elise ;
Fresard, Laure ;
Gamazon, Eric R. ;
Hamel, Andrew R. ;
He, Yuan ;
Hormozdiari, Farhad ;
Mohammadi, Pejman ;
Munoz-Aguirre, Manuel ;
Ardlie, Kristin G. ;
Battle, Alexis ;
Bonazzola, Rodrigo ;
Brown, Christopher D. ;
Cox, Nancy ;
Dermitzakis, Emmanouil T. ;
Engelhardt, Barbara E. ;
Garrido-Martin, Diego ;
Gay, Nicole R. ;
Getz, Gad ;
Guigo, Roderic ;
Hamel, Andrew R. ;
Handsaker, Robert E. ;
He, Yuan ;
Hoffman, Paul J. ;
Hormozdiari, Farhad ;
Im, Hae Kyung ;
Jo, Brian ;
Kasela, Silva ;
Kashin, Seva ;
Kim-Hellmuth, Sarah ;
Kwong, Alan ;
Lappalainen, Tuuli ;
Li, Xiao ;
Liang, Yanyu ;
MacArthur, Daniel G. ;
Mohammadi, Pejman ;
Montgomery, Stephen B. ;
Munoz-Aguirre, Manuel ;
Rouhana, John M. ;
Hormozdiari, Farhad ;
Im, Hae Kyung .
SCIENCE, 2020, 369 (6509) :1318-1330
[2]   Leveraging transcript quantification for fast computation of alternative splicing profiles [J].
Alamancos, Gael P. ;
Pages, Amadis ;
Trincado, Juan L. ;
Bellora, Nicolas ;
Eyras, Eduardo .
RNA, 2015, 21 (09) :1521-1531
[3]  
Alamancos GP, 2014, METHODS MOL BIOL, V1126, P357, DOI 10.1007/978-1-62703-980-2_26
[4]   Opportunities and challenges in long-read sequencing data analysis [J].
Amarasinghe, Shanika L. ;
Su, Shian ;
Dong, Xueyi ;
Zappia, Luke ;
Ritchie, Matthew E. ;
Gouil, Quentin .
GENOME BIOLOGY, 2020, 21 (01)
[5]   Splicing-factor alterations in cancers [J].
Anczukow, Olga ;
Krainer, Adrian R. .
RNA, 2016, 22 (09) :1285-1301
[6]   Detecting differential usage of exons from RNA-seq data [J].
Anders, Simon ;
Reyes, Alejandro ;
Huber, Wolfgang .
GENOME RESEARCH, 2012, 22 (10) :2008-2017
[7]   Alternative splicing as a regulator of development and tissue identity [J].
Baralle, Francisco E. ;
Giudice, Jimena .
NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2017, 18 (07) :437-451
[8]   Concentration-dependent splicing is enabled by Rbfox motifs of intermediate affinity [J].
Begg, Bridget E. ;
Jens, Marvin ;
Wang, Peter Y. ;
Minor, Christine M. ;
Burge, Christopher B. .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2020, 27 (10) :901-+
[9]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[10]   Realizing the potential of full-length transcriptome sequencing [J].
Byrne, Ashley ;
Cole, Charles ;
Volden, Roger ;
Vollmers, Christopher .
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2019, 374 (1786)