Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

被引:229
作者
VanBuren, Robert [1 ]
Bryant, Doug [1 ]
Edger, Patrick P. [2 ,3 ]
Tang, Haibao [4 ,5 ]
Burgess, Diane [2 ]
Challabathula, Dinakar [6 ]
Spittle, Kristi [7 ]
Hall, Richard [7 ]
Gu, Jenny [7 ]
Lyons, Eric [4 ]
Freeling, Michael [2 ]
Bartels, Dorothea [6 ]
Ten Hallers, Boudewijn [8 ]
Hastie, Alex [8 ]
Michael, Todd P. [9 ]
Mockler, Todd C. [1 ]
机构
[1] Donald Danforth Plant Sci Ctr, St Louis, MO 63132 USA
[2] Univ Calif Berkeley, Dept Plant & Microbial Biol, Berkeley, CA 94720 USA
[3] Michigan State Univ, Dept Hort, E Lansing, MI 48323 USA
[4] Univ Arizona, Sch Plant Sci, IPlant Collaborat, Tucson, AZ 85721 USA
[5] Fujian Agr & Forestry Univ, HIST, Ctr Genom & Biotechnol, Fuzhou 350002, Peoples R China
[6] Univ Bonn, IMBIO, D-53115 Bonn, Germany
[7] Pacific Biosci, Menlo Pk, CA 94025 USA
[8] BioNano Genom, San Diego, CA 92121 USA
[9] Ibis Biosci, Carlsbad, CA 92008 USA
基金
美国国家科学基金会;
关键词
STRUCTURAL VARIATION; GENOME COMPARISONS; TANDEM REPEATS; DNA; REVEALS; GENE; SIZE; IDENTIFICATION; TRANSCRIPTOME; COMPLEXITY;
D O I
10.1038/nature15714
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly(1). The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE)(2). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
引用
收藏
页码:508 / U209
页数:16
相关论文
共 55 条
[21]   Full-length transcriptome assembly from RNA-Seq data without a reference genome [J].
Grabherr, Manfred G. ;
Haas, Brian J. ;
Yassour, Moran ;
Levin, Joshua Z. ;
Thompson, Dawn A. ;
Amit, Ido ;
Adiconis, Xian ;
Fan, Lin ;
Raychowdhury, Raktima ;
Zeng, Qiandong ;
Chen, Zehua ;
Mauceli, Evan ;
Hacohen, Nir ;
Gnirke, Andreas ;
Rhind, Nicholas ;
di Palma, Federica ;
Birren, Bruce W. ;
Nusbaum, Chad ;
Lindblad-Toh, Kerstin ;
Friedman, Nir ;
Regev, Aviv .
NATURE BIOTECHNOLOGY, 2011, 29 (07) :644-U130
[22]   CD-HIT Suite: a web server for clustering and comparing biological sequences [J].
Huang, Ying ;
Niu, Beifang ;
Gao, Ying ;
Fu, Limin ;
Li, Weizhong .
BIOINFORMATICS, 2010, 26 (05) :680-682
[23]   Architecture and evolution of a minute plant genome [J].
Ibarra-Laclette, Enrique ;
Lyons, Eric ;
Hernandez-Guzman, Gustavo ;
Anahi Perez-Torres, Claudia ;
Carretero-Paulet, Lorenzo ;
Chang, Tien-Hao ;
Lan, Tianying ;
Welch, Andreanna J. ;
Abraham Juarez, Maria Jazmin ;
Simpson, June ;
Fernandez-Cortes, Araceli ;
Arteaga-Vazquez, Mario ;
Gongora-Castillo, Elsa ;
Acevedo-Hernandez, Gustavo ;
Schuster, Stephan C. ;
Himmelbauer, Heinz ;
Minoche, Andre E. ;
Xu, Sen ;
Lynch, Michael ;
Oropeza-Aburto, Araceli ;
Alan Cervantes-Perez, Sergio ;
de Jesus Ortega-Estrada, Maria ;
Israel Cervantes-Luevano, Jacob ;
Michael, Todd P. ;
Mockler, Todd ;
Bryant, Douglas ;
Herrera-Estrella, Alfredo ;
Albert, Victor A. ;
Herrera-Estrella, Luis .
NATURE, 2013, 498 (7452) :94-+
[24]   Genome conflict in the gramineae [J].
Jones, N ;
Pasakinskiene, I .
NEW PHYTOLOGIST, 2005, 165 (02) :391-409
[25]   Repbase update, a database of eukaryotic repetitive elements [J].
Jurka, J ;
Kapitonov, VV ;
Pavlicek, A ;
Klonowski, P ;
Kohany, O ;
Walichiewicz, J .
CYTOGENETIC AND GENOME RESEARCH, 2005, 110 (1-4) :462-467
[26]   Analysis of the genome sequence of the flowering plant Arabidopsis thaliana [J].
Kaul, S ;
Koo, HL ;
Jenkins, J ;
Rizzo, M ;
Rooney, T ;
Tallon, LJ ;
Feldblyum, T ;
Nierman, W ;
Benito, MI ;
Lin, XY ;
Town, CD ;
Venter, JC ;
Fraser, CM ;
Tabata, S ;
Nakamura, Y ;
Kaneko, T ;
Sato, S ;
Asamizu, E ;
Kato, T ;
Kotani, H ;
Sasamoto, S ;
Ecker, JR ;
Theologis, A ;
Federspiel, NA ;
Palm, CJ ;
Osborne, BI ;
Shinn, P ;
Conway, AB ;
Vysotskaia, VS ;
Dewar, K ;
Conn, L ;
Lenz, CA ;
Kim, CJ ;
Hansen, NF ;
Liu, SX ;
Buehler, E ;
Altafi, H ;
Sakano, H ;
Dunn, P ;
Lam, B ;
Pham, PK ;
Chao, Q ;
Nguyen, M ;
Yu, GX ;
Chen, HM ;
Southwick, A ;
Lee, JM ;
Miranda, M ;
Toriumi, MJ ;
Davis, RW .
NATURE, 2000, 408 (6814) :796-815
[27]   Defining functional DNA elements in the human genome [J].
Kellis, Manolis ;
Wold, Barbara ;
Snyder, Michael P. ;
Bernstein, Bradley E. ;
Kundaje, Anshul ;
Marinov, Georgi K. ;
Ward, Lucas D. ;
Birney, Ewan ;
Crawford, Gregory E. ;
Dekker, Job ;
Dunham, Ian ;
Elnitski, Laura L. ;
Farnham, Peggy J. ;
Feingold, Elise A. ;
Gerstein, Mark ;
Giddings, Morgan C. ;
Gilbert, David M. ;
Gingeras, Thomas R. ;
Green, Eric D. ;
Guigo, Roderic ;
Hubbard, Tim ;
Kent, Jim ;
Lieb, Jason D. ;
Myers, Richard M. ;
Pazin, Michael J. ;
Ren, Bing ;
Stamatoyannopoulos, John A. ;
Weng, Zhiping ;
White, Kevin P. ;
Hardison, Ross C. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (17) :6131-6138
[28]   Adaptive seeds tame genomic sequence comparison [J].
Kielbasa, Szymon M. ;
Wan, Raymond ;
Sato, Kengo ;
Horton, Paul ;
Frith, Martin C. .
GENOME RESEARCH, 2011, 21 (03) :487-493
[29]   Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly [J].
Lam, Ernest T. ;
Hastie, Alex ;
Lin, Chin ;
Ehrlich, Dean ;
Das, Somes K. ;
Austin, Michael D. ;
Deshpande, Paru ;
Cao, Han ;
Nagarajan, Niranjan ;
Xiao, Ming ;
Kwok, Pui-Yan .
NATURE BIOTECHNOLOGY, 2012, 30 (08) :771-776
[30]   The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools [J].
Lamesch, Philippe ;
Berardini, Tanya Z. ;
Li, Donghui ;
Swarbreck, David ;
Wilks, Christopher ;
Sasidharan, Rajkumar ;
Muller, Robert ;
Dreher, Kate ;
Alexander, Debbie L. ;
Garcia-Hernandez, Margarita ;
Karthikeyan, Athikkattuvalasu S. ;
Lee, Cynthia H. ;
Nelson, William D. ;
Ploetz, Larry ;
Singh, Shanker ;
Wensel, April ;
Huala, Eva .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D1202-D1210