The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads

被引:318
作者
Wang, Zhiwen [1 ]
Hobson, Neil [2 ]
Galindo, Leonardo [2 ]
Zhu, Shilin [1 ]
Shi, Daihu [1 ]
McDill, Joshua [2 ]
Yang, Linfeng [1 ]
Hawkins, Simon [3 ]
Neutelings, Godfrey [3 ]
Datla, Raju [4 ]
Lambert, Georgina [5 ,6 ]
Galbraith, David W. [5 ,6 ]
Grassa, Christopher J. [7 ]
Geraldes, Armando [7 ]
Cronk, Quentin C. [7 ]
Cullis, Christopher [8 ]
Dash, Prasanta K. [9 ]
Kumar, Polumetla A. [9 ]
Cloutier, Sylvie [10 ,11 ]
Sharpe, Andrew G. [4 ]
Wong, Gane K. -S. [1 ,2 ,12 ]
Wang, Jun [1 ,13 ,14 ]
Deyholos, Michael K. [2 ]
机构
[1] Bei Shan Ind Zone, BGI Shenzen, Shenzhen 518083, Peoples R China
[2] Univ Alberta, Dept Biol Sci, Edmonton, AB T6G 2E9, Canada
[3] Univ Lille 1, Unite Mixte Rech Inst Natl Rech Agron 1281, F-59650 Villeneuve Dascq, France
[4] Natl Res Council Canada, Inst Plant Biotechnol, Saskatoon, SK S7N 0W9, Canada
[5] Univ Arizona, Sch Plant Sci, Tucson, AZ 85721 USA
[6] BIO5 Inst, Tucson, AZ 85721 USA
[7] Univ British Columbia, Dept Bot, Vancouver, BC V6T 1Z4, Canada
[8] Case Western Reserve Univ, Cleveland, OH 44106 USA
[9] Indian Agr Res Inst, Natl Res Ctr Plant Biotechnol, New Delhi 110012, India
[10] Agr & Agri Food Canada, Winnipeg, MB R3T 2M1, Canada
[11] Univ Manitoba, Dept Plant Sci, Winnipeg, MB R3T 2N2, Canada
[12] Univ Alberta, Dept Med, Edmonton, AB T6G 2E1, Canada
[13] Univ Copenhagen, Novo Nordisk Fdn Ctr Basic Metab Res, DK-1168 Copenhagen, Denmark
[14] Univ Copenhagen, Dept Biol, DK-1168 Copenhagen, Denmark
关键词
whole-genome shotgun; DNA sequencing; Illumina; flax; Malpighiales; industrial crops; AMINO-ACID-SEQUENCES; NUCLEAR-DNA CONTENT; LTR RETROTRANSPOSONS; CODING DNA; RNA GENES; DATABASE; PROGRAM; ORGANIZATION; ALIGNMENT; L;
D O I
10.1111/j.1365-313X.2012.05093.x
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94 x raw, approximately 69 x filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N50 = 694 kb, inclf non-redundant sequence representing an estimated 81% genome coverage.uding contigs with N50 = 20.1 kb. The contig assembly contained 302 Mb o Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43 384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (Ks) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species.
引用
收藏
页码:461 / 473
页数:13
相关论文
共 76 条
  • [1] De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera)
    Al-Dous, Eman K.
    George, Binu
    Al-Mahmoud, Maryam E.
    Al-Jaber, Moneera Y.
    Wang, Hao
    Salameh, Yasmeen M.
    Al-Azwani, Eman K.
    Chaluvadi, Srinivasa
    Pontaroli, Ana C.
    DeBarry, Jeremy
    Arondel, Vincent
    Ohlrogge, John
    Saie, Imad J.
    Suliman-Elmeer, Khaled M.
    Bennetzen, Jeffrey L.
    Kruegger, Robert R.
    Malek, Joel A.
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (06) : 521 - U84
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] [Anonymous], 2000, DOMESTICATION PLANTS, DOI DOI 10.1006/ANBO.2001.1505
  • [4] Multiple Paleopolyploidizations during the Evolution of the Compositae Reveal Parallel Patterns of Duplicate Gene Retention after Millions of Years
    Barker, Michael S.
    Kane, Nolan C.
    Matvienko, Marta
    Kozik, Alexander
    Michelmore, W.
    Knapp, Steven J.
    Rieseberg, Loren H.
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2008, 25 (11) : 2445 - 2455
  • [5] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [6] NUCLEAR-DNA CONTENT OF MONOCOTYLEDONS AND RELATED TAXA
    BHARATHAN, G
    LAMBERT, G
    GALBRAITH, DW
    [J]. AMERICAN JOURNAL OF BOTANY, 1994, 81 (03) : 381 - 386
  • [7] GeneWise and genomewise
    Birney, E
    Clamp, M
    Durbin, R
    [J]. GENOME RESEARCH, 2004, 14 (05) : 988 - 995
  • [8] Could biopolymers reinforced by randomly scattered flax fibre be used in structural applications?
    Bodros, Edwin
    Pillin, Isabelle
    Montrelay, Nicolas
    Baley, Christophe
    [J]. COMPOSITES SCIENCE AND TECHNOLOGY, 2007, 67 (3-4) : 462 - 470
  • [9] Draft genome sequence of the oilseed species Ricinus communis
    Chan, Agnes P.
    Crabtree, Jonathan
    Zhao, Qi
    Lorenzi, Hernan
    Orvis, Joshua
    Puiu, Daniela
    Melake-Berhan, Admasu
    Jones, Kristine M.
    Redman, Julia
    Chen, Grace
    Cahoon, Edgar B.
    Gedil, Melaku
    Stanke, Mario
    Haas, Brian J.
    Wortman, Jennifer R.
    Fraser-Liggett, Claire M.
    Ravel, Jacques
    Rabinowicz, Pablo D.
    [J]. NATURE BIOTECHNOLOGY, 2010, 28 (09) : 951 - U3
  • [10] Chen Y., 1999, INSERTION SEQUENCE F