Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes

被引:23
作者
Barthelson, Roger [1 ]
McFarlin, Adam J. [2 ]
Rounsley, Steven D. [1 ,3 ]
Young, Sarah [4 ]
机构
[1] Univ Arizona, Inst BiO5, Tucson, AZ 85721 USA
[2] Apple Inc, Cupertino, CA USA
[3] Univ Arizona, Sch Plant Sci, Tucson, AZ USA
[4] Broad Inst, Cambridge, MA USA
来源
PLOS ONE | 2011年 / 6卷 / 12期
基金
美国国家科学基金会;
关键词
ARABIDOPSIS GENOME; CHALLENGES;
D O I
10.1371/journal.pone.0028436
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. Methodology/Principal Findings: For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. Conclusions/Significance: Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.
引用
收藏
页数:8
相关论文
共 18 条
  • [1] Sequence and analysis of the Arabidopsis genome
    Bevan, M
    Mayer, K
    White, O
    Eisen, JA
    Preuss, D
    Bureau, T
    Salzberg, SL
    Mewes, HW
    [J]. CURRENT OPINION IN PLANT BIOLOGY, 2001, 4 (02) : 105 - 110
  • [2] Analysis and mapping of randomly chosen bacterial artificial chromosome clones from hexaploid bread wheat
    Devos, KM
    Ma, JX
    Pontaroli, AC
    Pratt, LH
    Bennetzen, JL
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (52) : 19243 - 19248
  • [3] A comprehensive genetic map of the human genome based on 5,264 microsatellites
    Dib, C
    Faure, S
    Fizames, C
    Samson, D
    Drouot, N
    Vignal, A
    Millasseau, P
    Marc, S
    Hazan, J
    Seboun, E
    Lathrop, M
    Gyapay, G
    Morissette, J
    Weissenbach, J
    [J]. NATURE, 1996, 380 (6570) : 152 - 154
  • [4] Flicek P, 2009, NAT METHODS, V6, pS6, DOI [10.1038/NMETH.1376, 10.1038/nmeth.1376]
  • [5] Crystallizing short-read assemblies around seeds
    Hossain, Mohammad Sajjad
    Azimi, Navid
    Skiena, Steven
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [6] De novo sequencing of plant genomes using second-generation technologies
    Imelfort, Michael
    Edwards, David
    [J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (06) : 609 - 618
  • [7] A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes
    Kurtz, Stefan
    Narechania, Apurva
    Stein, Joshua C.
    Ware, Doreen
    [J]. BMC GENOMICS, 2008, 9 (1) : 517
  • [8] LANDER E S, 1988, Genomics, V2, P231
  • [9] An Eulerian path approach to DNA fragment assembly
    Pevzner, PA
    Tang, HX
    Waterman, MS
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (17) : 9748 - 9753
  • [10] Genome assembly reborn: recent computational challenges
    Pop, Mihai
    [J]. BRIEFINGS IN BIOINFORMATICS, 2009, 10 (04) : 354 - 366