VIPER: Visualization Pipeline for RNA-seq, a Snakemake workflow for efficient and complete RNA-seq analysis

被引:141
作者
Cornwell, MacIntosh [1 ]
Vangala, Mahesh [6 ]
Taing, Len [1 ,2 ]
Herbert, Zachary [7 ]
Koester, Johannes [1 ,5 ]
Li, Bo [3 ,4 ]
Sun, Hanfei [8 ]
Li, Taiwen [9 ]
Zhang, Jian [10 ]
Qiu, Xintao [1 ,2 ]
Pun, Matthew [1 ]
Jeselsohn, Rinath [1 ,2 ]
Brown, Myles [1 ,2 ]
Liu, X. Shirley [1 ,2 ,3 ,4 ]
Long, Henry W. [1 ,2 ]
机构
[1] Dana Farber Canc Inst, Dept Med Oncol, Boston, MA 02215 USA
[2] Dana Farber Canc Inst, Ctr Funct Canc Epigenet, Boston, MA 02215 USA
[3] Dana Farber Canc Inst, Dept Biostat & Computat Biol, Boston, MA 02215 USA
[4] Harvard Sch Publ Hlth, Boston, MA 02215 USA
[5] Univ Duisburg Essen, Inst Human Genet, Essen, Germany
[6] Univ Massachusetts, Sch Med, Worcester, MA 01655 USA
[7] Dana Farber Canc Inst, Mol Biol Core Facil, Boston, MA 02215 USA
[8] Tongji Univ, Sch Life Sci, Dept Bioinformat, Shanghai 200092, Peoples R China
[9] Sichuan Univ, West China Hosp Stomatol, State Key Lab Oral Dis, Chengdu, Sichuan, Peoples R China
[10] Beijing Inst Basic Med Sci, Beijing, Peoples R China
基金
中国国家自然科学基金; 美国国家卫生研究院;
关键词
RNA-seq; Analysis; Pipeline; Snakemake; Gene fusion; Immunological infiltrate; INTEGRATION; DISCOVERY; PACKAGE; CANCER; TRANSCRIPTS;
D O I
10.1186/s12859-018-2139-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts. Results: Using the workflow management system Snakemake we have developed a user friendly, fast, efficient, and comprehensive pipeline for RNA-seq analysis. VIPER (Visualization Pipeline for RNA-seq analysis) is an analysis workflow that combines some of the most popular tools to take RNA-seq analysis from raw sequencing data, through alignment and quality control, into downstream differential expression and pathway analysis. VIPER has been created in a modular fashion to allow for the rapid incorporation of new tools to expand the capabilities. This capacity has already been exploited to include very recently developed tools that explore immune infiltrate and T-cell CDR (Complementarity-Determining Regions) reconstruction abilities. The pipeline has been conveniently packaged such that minimal computational skills are required to download and install the dozens of software packages that VIPER uses. Conclusions: VIPER is a comprehensive solution that performs most standard RNA-seq analyses quickly and effectively with a built-in capacity for customization and expansion.
引用
收藏
页数:14
相关论文
共 45 条
[1]   VirusSeq: software to identify viruses and their integration sites using next-generation sequencing of human cancer tissue [J].
Chen, Yunxin ;
Yao, Hui ;
Thompson, Erika J. ;
Tannir, Nizar M. ;
Weinstein, John N. ;
Su, Xiaoping .
BIOINFORMATICS, 2013, 29 (02) :266-267
[2]   A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 [J].
Cingolani, Pablo ;
Platts, Adrian ;
Wang, Le Lily ;
Coon, Melissa ;
Tung Nguyen ;
Wang, Luan ;
Land, Susan J. ;
Lu, Xiangyi ;
Ruden, Douglas M. .
FLY, 2012, 6 (02) :80-92
[3]   A survey of best practices for RNA-seq data analysis [J].
Conesa, Ana ;
Madrigal, Pedro ;
Tarazona, Sonia ;
Gomez-Cabrero, David ;
Cervera, Alejandra ;
McPherson, Andrew ;
Szczesniak, Michal Wojciech ;
Gaffney, Daniel J. ;
Elo, Laura L. ;
Zhang, Xuegong ;
Mortazavi, Ali .
GENOME BIOLOGY, 2016, 17
[4]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[5]   Using GOstats to test gene lists for GO term association [J].
Falcon, S. ;
Gentleman, R. .
BIOINFORMATICS, 2007, 23 (02) :257-258
[6]   Why Batch Effects Matter in Omics Data, and How to Avoid Them [J].
Goh, Wilson Wen Bin ;
Wang, Wei ;
Wong, Limsoon .
TRENDS IN BIOTECHNOLOGY, 2017, 35 (06) :498-507
[7]   Complex heatmaps reveal patterns and correlations in multidimensional genomic data [J].
Gu, Zuguang ;
Eils, Roland ;
Schlesner, Matthias .
BIOINFORMATICS, 2016, 32 (18) :2847-2849
[8]  
Haas B, 2017, BIORXIV
[9]  
Hesselberth JR, 2009, NAT METHODS, V6, P283, DOI [10.1038/NMETH.1313, 10.1038/nmeth.1313]
[10]   Differential Expression Analysis for RNA-Seq: An Overview of Statistical Methods and Computational Software [J].
Huang, Huei-Chung ;
Niu, Yi ;
Qin, Li-Xuan .
CANCER INFORMATICS, 2015, 14 :57-67