SeqMule: automated pipeline for analysis of human exome/genome sequencing data

被引:51
作者
Guo, Yunfei [1 ,2 ]
Ding, Xiaolei [3 ]
Shen, Yufeng [4 ,5 ]
Lyon, Gholson J. [6 ,7 ]
Wang, Kai [1 ,2 ,7 ,8 ]
机构
[1] Univ So Calif, Zilkha Neurogenet Inst, Los Angeles, CA 90033 USA
[2] Univ So Calif, Keck Sch Med, Dept Prevent Med, Los Angeles, CA 90032 USA
[3] Nanjing Forestry Univ, Sch Forestry & Environm, Nanjing 210037, Jiangsu, Peoples R China
[4] Columbia Univ, Dept Syst Biol, New York, NY 10032 USA
[5] Columbia Univ, Dept Biomed Informat, New York, NY 10032 USA
[6] Cold Spring Harbor Lab, Stanley Inst Cognit Genom, New York, NY 11797 USA
[7] Utah Fdn Biomed Res, Provo, UT 84601 USA
[8] Univ So Calif, Keck Sch Med, Dept Psychiat & Behav Sci, Los Angeles, CA 90033 USA
关键词
READ ALIGNMENT; SNP-DETECTION; EXOME; TOOL; GALAXY; BIOINFORMATICS; ULTRAFAST; WORKFLOWS; DISCOVERY; MUTATION;
D O I
10.1038/srep14283
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
引用
收藏
页数:10
相关论文
共 47 条
[1]   Tavaxy: Integrating Taverna and Galaxy workflows with cloud computing support [J].
Abouelhoda, Mohamed ;
Issa, Shadi Alaa ;
Ghanem, Moustafa .
BMC BIOINFORMATICS, 2012, 13
[2]   Galaxy CloudMan: delivering cloud compute clusters [J].
Afgan, Enis ;
Baker, Dannon ;
Coraor, Nate ;
Chapman, Brad ;
Nekrutenko, Anton ;
Taylor, James .
BMC BIOINFORMATICS, 2010, 11
[3]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[4]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[5]   CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing [J].
Angiuoli, Samuel V. ;
Matalka, Malcolm ;
Gussman, Aaron ;
Galens, Kevin ;
Vangala, Mahesh ;
Riley, David R. ;
Arze, Cesar ;
White, James R. ;
White, Owen ;
Fricke, W. Florian .
BMC BIOINFORMATICS, 2011, 12
[6]  
[Anonymous], ARXIV E PRINTS
[7]   Exome sequencing as a tool for Mendelian disease gene discovery [J].
Bamshad, Michael J. ;
Ng, Sarah B. ;
Bigham, Abigail W. ;
Tabor, Holly K. ;
Emond, Mary J. ;
Nickerson, Deborah A. ;
Shendure, Jay .
NATURE REVIEWS GENETICS, 2011, 12 (11) :745-755
[8]   ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence [J].
Blanca, Jose M. ;
Pascual, Laura ;
Ziarsolo, Peio ;
Nuez, Fernando ;
Canizares, Joaquin .
BMC GENOMICS, 2011, 12
[9]   wANNOVAR: annotating genetic variants for personal genomes via the web [J].
Chang, Xiao ;
Wang, Kai .
JOURNAL OF MEDICAL GENETICS, 2012, 49 (07) :433-436
[10]  
Chen H., 2011, CRAN