Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses

被引:68
作者
Golosova, Olga [1 ]
Henderson, Ross [2 ]
Vaskin, Yuriy [1 ]
Gabrielian, Andrei [2 ]
Grekhov, German [1 ]
Nagarajan, Vijayaraj [2 ]
Oler, Andrew J. [2 ]
Nones, Mariam Qui [2 ]
Hurt, Darrell [2 ]
Fursov, Mikhail [1 ]
Huyen, Yentram [2 ]
机构
[1] Unipro Ctr Informat Technol, Novosibirsk, Russia
[2] NIAID, Bioinformat & Computat Biosci Branch, Off Cyber Infrastruct & Computat Biol, NIH, Bethesda, MD 20892 USA
关键词
Bioinformatics; Next-generation sequencing; Data analysis; ChIP-seq; Variant calling; RNA-seq; READ ALIGNMENT; WORKFLOWS; TOPHAT; GENE;
D O I
10.7717/peerj.644
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The advent of Next Generation Sequencing (NGS) technologies has opened new possibilities for researchers. However, the more biology becomes a data-intensive field, the more biologists have to learn how to process and analyze NGS data with complex computational tools. Even with the availability of common pipeline specifications, it is often a time-consuming and cumbersome task for a bench scientist to install and configure the pipeline tools. We believe that a unified, desktop and biologist-friendly front end to NGS data analysis tools will substantially improve productivity in this field. Here we present NGS pipelines "Variant Calling with SAMtools", "Tuxedo Pipeline for RNA-seq Data Analysis" and "Cistrome Pipeline for ChIP-seq Data Analysis" integrated into the Unipro UGENE desktop toolkit. We describe the available UGENE infrastructure that helps researchers run these pipelines on different datasets, store and investigate the results and re-run the pipelines with the same parameters. These pipeline tools are included in the UGENE NGS package. Individual blocks of these pipelines are also available for expert users to create their own advanced workflows.
引用
收藏
页数:15
相关论文
共 17 条
[1]   myExperiment: a repository and social network for the sharing of bioinformatics workflows [J].
Goble, Carole A. ;
Bhagat, Jiten ;
Aleksejevs, Sergejs ;
Cruickshank, Don ;
Michaelides, Danius ;
Newman, David ;
Borkum, Mark ;
Bechhofer, Sean ;
Roos, Marco ;
Li, Peter ;
De Roure, David .
NUCLEIC ACIDS RESEARCH, 2010, 38 :W677-W682
[2]   Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences [J].
Goecks, Jeremy ;
Nekrutenko, Anton ;
Taylor, James .
GENOME BIOLOGY, 2010, 11 (08)
[3]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
[4]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[5]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[6]   Cistrome: an integrative platform for transcriptional regulation studies [J].
Liu, Tao ;
Ortiz, Jorge A. ;
Taing, Len ;
Meyer, Clifford A. ;
Lee, Bernett ;
Zhang, Yong ;
Shin, Hyunjin ;
Wong, Swee S. ;
Ma, Jian ;
Lei, Ying ;
Pape, Utz J. ;
Poidinger, Michael ;
Chen, Yiwen ;
Yeung, Kevin ;
Brown, Myles ;
Turpaz, Yaron ;
Liu, X. Shirley .
GENOME BIOLOGY, 2011, 12 (08)
[7]   An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments [J].
Liu, XS ;
Brutlag, DL ;
Liu, JS .
NATURE BIOTECHNOLOGY, 2002, 20 (08) :835-839
[8]   TRANSFAC® and its module TRANSCompel®:: transcriptional gene regulation in eukaryotes [J].
Matys, V. ;
Kel-Margoulis, O. V. ;
Fricke, E. ;
Liebich, I. ;
Land, S. ;
Barre-Dirrie, A. ;
Reuter, I. ;
Chekmenev, D. ;
Krull, M. ;
Hornischer, K. ;
Voss, N. ;
Stegmaier, P. ;
Lewicki-Potapov, B. ;
Saxel, H. ;
Kel, A. E. ;
Wingender, E. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D108-D110
[9]   Unipro UGENE: a unified bioinformatics toolkit [J].
Okonechnikov, Konstantin ;
Golosova, Olga ;
Fursov, Mikhail .
BIOINFORMATICS, 2012, 28 (08) :1166-1167
[10]   JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles [J].
Portales-Casamar, Elodie ;
Thongjuea, Supat ;
Kwon, Andrew T. ;
Arenillas, David ;
Zhao, Xiaobei ;
Valen, Eivind ;
Yusuf, Dimas ;
Lenhard, Boris ;
Wasserman, Wyeth W. ;
Sandelin, Albin .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D105-D110