Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

被引:229
作者
VanBuren, Robert [1 ]
Bryant, Doug [1 ]
Edger, Patrick P. [2 ,3 ]
Tang, Haibao [4 ,5 ]
Burgess, Diane [2 ]
Challabathula, Dinakar [6 ]
Spittle, Kristi [7 ]
Hall, Richard [7 ]
Gu, Jenny [7 ]
Lyons, Eric [4 ]
Freeling, Michael [2 ]
Bartels, Dorothea [6 ]
Ten Hallers, Boudewijn [8 ]
Hastie, Alex [8 ]
Michael, Todd P. [9 ]
Mockler, Todd C. [1 ]
机构
[1] Donald Danforth Plant Sci Ctr, St Louis, MO 63132 USA
[2] Univ Calif Berkeley, Dept Plant & Microbial Biol, Berkeley, CA 94720 USA
[3] Michigan State Univ, Dept Hort, E Lansing, MI 48323 USA
[4] Univ Arizona, Sch Plant Sci, IPlant Collaborat, Tucson, AZ 85721 USA
[5] Fujian Agr & Forestry Univ, HIST, Ctr Genom & Biotechnol, Fuzhou 350002, Peoples R China
[6] Univ Bonn, IMBIO, D-53115 Bonn, Germany
[7] Pacific Biosci, Menlo Pk, CA 94025 USA
[8] BioNano Genom, San Diego, CA 92121 USA
[9] Ibis Biosci, Carlsbad, CA 92008 USA
基金
美国国家科学基金会;
关键词
STRUCTURAL VARIATION; GENOME COMPARISONS; TANDEM REPEATS; DNA; REVEALS; GENE; SIZE; IDENTIFICATION; TRANSCRIPTOME; COMPLEXITY;
D O I
10.1038/nature15714
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly(1). The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE)(2). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
引用
收藏
页码:508 / U209
页数:16
相关论文
共 55 条
[1]  
[Anonymous], NUCL ACIDS RES
[2]   Characterization of the human ESC transcriptome by hybrid sequencing [J].
Au, Kin Fai ;
Sebastiano, Vittorio ;
Afshar, Pegah Tootoonchi ;
Durruthy, Jens Durruthy ;
Lee, Lawrence ;
Williams, Brian A. ;
van Bakel, Harm ;
Schadt, Eric E. ;
Reijo-Pera, Renee A. ;
Underwood, Jason G. ;
Wong, Wing Hung .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (50) :E4821-E4830
[3]  
Bartels D, 2002, MAYDICA, V47, P185
[4]   Genome size is a strong predictor of cell size and stomatal density in angiosperms [J].
Beaulieu, Jeremy M. ;
Leitch, Ilia J. ;
Patel, Sunil ;
Pendharkar, Arjun ;
Knight, Charles A. .
NEW PHYTOLOGIST, 2008, 179 (04) :975-986
[5]   Reference genome sequence of the model plant Setaria [J].
Bennetzen, Jeffrey L. ;
Schmutz, Jeremy ;
Wang, Hao ;
Percifield, Ryan ;
Hawkins, Jennifer ;
Pontaroli, Ana C. ;
Estep, Matt ;
Feng, Liang ;
Vaughn, Justin N. ;
Grimwood, Jane ;
Jenkins, Jerry ;
Barry, Kerrie ;
Lindquist, Erika ;
Hellsten, Uffe ;
Deshpande, Shweta ;
Wang, Xuewen ;
Wu, Xiaomei ;
Mitros, Therese ;
Triplett, Jimmy ;
Yang, Xiaohan ;
Ye, Chu-Yu ;
Mauro-Herrera, Margarita ;
Wang, Lin ;
Li, Pinghua ;
Sharma, Manoj ;
Sharma, Rita ;
Ronald, Pamela C. ;
Panaud, Olivier ;
Kellogg, Elizabeth A. ;
Brutnell, Thomas P. ;
Doust, Andrew N. ;
Tuskan, Gerald A. ;
Rokhsar, Daniel ;
Devos, Katrien M. .
NATURE BIOTECHNOLOGY, 2012, 30 (06) :555-+
[6]  
Bennetzen JL, 1997, PLANT CELL, V9, P1509, DOI 10.1105/tpc.9.9.1509
[7]   Tandem repeats finder: a program to analyze DNA sequences [J].
Benson, G .
NUCLEIC ACIDS RESEARCH, 1999, 27 (02) :573-580
[8]   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing [J].
Berlin, Konstantin ;
Koren, Sergey ;
Chin, Chen-Shan ;
Drake, James P. ;
Landolin, Jane M. ;
Phillippy, Adam M. .
NATURE BIOTECHNOLOGY, 2015, 33 (06) :623-+
[9]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[10]   MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes [J].
Cantarel, Brandi L. ;
Korf, Ian ;
Robb, Sofia M. C. ;
Parra, Genis ;
Ross, Eric ;
Moore, Barry ;
Holt, Carson ;
Alvarado, Alejandro Sanchez ;
Yandell, Mark .
GENOME RESEARCH, 2008, 18 (01) :188-196