High-quality draft assemblies of mammalian genomes from massively parallel sequence data

被引:1146
作者
Gnerre, Sante [1 ,2 ]
MacCallum, Iain [1 ,2 ]
Przybylski, Dariusz [1 ,2 ]
Ribeiro, Filipe J. [1 ,2 ]
Burton, Joshua N. [1 ,2 ]
Walker, Bruce J. [1 ,2 ]
Sharpe, Ted [1 ,2 ]
Hall, Giles [1 ,2 ]
Shea, Terrance P. [1 ,2 ]
Sykes, Sean [1 ,2 ]
Berlin, Aaron M. [1 ,2 ]
Aird, Daniel [1 ,2 ]
Costello, Maura [1 ,2 ]
Daza, Riza [1 ,2 ]
Williams, Louise [1 ,2 ]
Nicol, Robert [1 ,2 ]
Gnirke, Andreas [1 ,2 ]
Nusbaum, Chad [1 ,2 ]
Lander, Eric S. [1 ,2 ,3 ,4 ]
Jaffe, David B. [1 ,2 ]
机构
[1] MIT, Broad Inst, Cambridge, MA 02142 USA
[2] Harvard Univ, Cambridge, MA 02142 USA
[3] MIT, Dept Biol, Cambridge, MA 02139 USA
[4] Harvard Univ, Sch Med, Dept Syst Biol, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1073/pnas.1017351108
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (similar to 100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (>= 99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
引用
收藏
页码:1513 / 1518
页数:6
相关论文
共 25 条
  • [1] Limitations of next-generation genome sequence assembly
    Alkan, Can
    Sajjadian, Saba
    Eichler, Evan E.
    [J]. NATURE METHODS, 2011, 8 (01) : 61 - 65
  • [2] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [3] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [4] ALLPATHS: De novo assembly of whole-genome shotgun microreads
    Butler, Jonathan
    MacCallum, Iain
    Kleber, Michael
    Shlyakhter, Ilya A.
    Belmonte, Matthew K.
    Lander, Eric S.
    Nusbaum, Chad
    Jaffe, David B.
    [J]. GENOME RESEARCH, 2008, 18 (05) : 810 - 820
  • [5] Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse
    Church, Deanna M.
    Goodstadt, Leo
    Hillier, LaDeana W.
    Zody, Michael C.
    Goldstein, Steve
    She, Xinwe
    Bult, Carol J.
    Agarwala, Richa
    Cherry, Joshua L.
    DiCuccio, Michael
    Hlavina, Wratko
    Kapustin, Yuri
    Meric, Peter
    Maglott, Donna
    Birtle, Zoe
    Marques, Ana C.
    Graves, Tina
    Zhou, Shiguo
    Teague, Brian
    Potamousis, Konstantinos
    Churas, Christopher
    Place, Michael
    Herschleb, Jill
    Runnheim, Ron
    Forrest, Daniel
    Amos-Landgraf, James
    Schwartz, David C.
    Cheng, Ze
    Lindblad-Toh, Kerstin
    Eichler, Evan E.
    Ponting, Chris P.
    [J]. PLOS BIOLOGY, 2009, 7 (05):
  • [6] Finishing the euchromatic sequence of the human genome
    Collins, FS
    Lander, ES
    Rogers, J
    Waterston, RH
    [J]. NATURE, 2004, 431 (7011) : 931 - 945
  • [7] DIRECTIONAL CLONING OF DNA FRAGMENTS AT A LARGE DISTANCE FROM AN INITIAL PROBE - A CIRCULARIZATION METHOD
    COLLINS, FS
    WEISSMAN, SM
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA-BIOLOGICAL SCIENCES, 1984, 81 (21): : 6812 - 6816
  • [8] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [9] 10,000 genomes to come
    Hayden, Erika Check
    [J]. NATURE, 2009, 462 (7269) : 21 - 21
  • [10] Kozarewa I, 2009, NAT METHODS, V6, P291, DOI [10.1038/NMETH.1311, 10.1038/nmeth.1311]