Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics

被引:110
作者
Kelly, Benjamin J. [1 ]
Fitch, James R. [1 ]
Hu, Yangqiu [1 ]
Corsmeier, Donald J. [1 ]
Zhong, Huachun [1 ]
Wetzel, Amy N. [1 ]
Nordquist, Russell D. [1 ]
Newsom, David L. [1 ]
White, Peter [1 ,2 ]
机构
[1] Nationwide Childrens Hosp, Res Inst, Ctr Microbial Pathogenesis, Columbus, OH 43205 USA
[2] Ohio State Univ, Coll Med, Dept Pediat, Columbus, OH 43210 USA
来源
GENOME BIOLOGY | 2015年 / 16卷
基金
美国国家科学基金会;
关键词
SEQUENCE; CLOUD; FRAMEWORK; GENOTYPE; RESOURCE; FORMAT; SNP;
D O I
10.1186/s13059-014-0577-x
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
While advances in genome sequencing technology make population- scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high- depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
引用
收藏
页数:14
相关论文
共 33 条
  • [1] Harnessing cloud computing with Galaxy Cloud
    Afgan, Enis
    Baker, Dannon
    Coraor, Nate
    Goto, Hiroki
    Paul, Ian M.
    Makova, Kateryna D.
    Nekrutenko, Anton
    Taylor, James
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (11) : 972 - 974
  • [2] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [3] Bocchino RLJ, 2009, HOTPAR 09 1 USENIX W
  • [4] An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge
    Brownstein, Catherine A.
    Beggs, Alan H.
    Homer, Nils
    Merriman, Barry
    Yu, Timothy W.
    Flannery, Katherine C.
    DeChene, Elizabeth T.
    Towne, Meghan C.
    Savage, Sarah K.
    Price, Emily N.
    Holm, Ingrid A.
    Luquette, Lovelace J.
    Lyon, Elaine
    Majzoub, Joseph
    Neupert, Peter
    McCallie, David, Jr.
    Szolovits, Peter
    Willard, Huntington F.
    Mendelsohn, Nancy J.
    Temme, Renee
    Finkel, Richard S.
    Yum, Sabrina W.
    Medne, Livija
    Sunyaev, Shamil R.
    Adzhubey, Ivan
    Cassa, Christopher A.
    de Bakker, Paul I. W.
    Duzkale, Hatice
    Dworzynski, Piotr
    Fairbrother, William
    Francioli, Laurent
    Funke, Birgit H.
    Giovanni, Monica A.
    Handsaker, Robert E.
    Lage, Kasper
    Lebo, Matthew S.
    Lek, Monkol
    Leshchiner, Ignaty
    MacArthur, Daniel G.
    McLaughlin, Heather M.
    Murray, Michael F.
    Pers, Tune H.
    Polak, Paz P.
    Raychaudhuri, Soumya
    Rehm, Heidi L.
    Soemedi, Rachel
    Stitziel, Nathan O.
    Vestecka, Sara
    Supper, Jochen
    Gugenmus, Claudia
    [J]. GENOME BIOLOGY, 2014, 15 (03):
  • [5] The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
    Cock, Peter J. A.
    Fields, Christopher J.
    Goto, Naohisa
    Heuer, Michael L.
    Rice, Peter M.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (06) : 1767 - 1771
  • [6] Genomic profiling of high-risk acute lymphoblastic leukemia
    Collins-Underwood, J. R.
    Mullighan, C. G.
    [J]. LEUKEMIA, 2010, 24 (10) : 1676 - 1685
  • [7] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +
  • [8] Next-generation sequencing: ready for the clinics?
    Desai, A. N.
    Jere, A.
    [J]. CLINICAL GENETICS, 2012, 81 (06) : 503 - 510
  • [9] Atlas2 Cloud: a framework for personal genome analysis in the cloud
    Evani, Uday S.
    Challis, Danny
    Yu, Jin
    Jackson, Andrew R.
    Paithankar, Sameer
    Bainbridge, Matthew N.
    Jakkamsetti, Adinarayana
    Peter Pham
    Coarfa, Cristian
    Milosavljevic, Aleksandar
    Yu, Fuli
    [J]. BMC GENOMICS, 2012, 13
  • [10] SAMBLASTER: fast duplicate marking and structural variant read extraction
    Faust, Gregory G.
    Hall, Ira M.
    [J]. BIOINFORMATICS, 2014, 30 (17) : 2503 - 2505