An improved genome release (version Mt4.0) for the model legume Medicago truncatula

被引:281
作者
Tang, Haibao [1 ]
Krishnakumar, Vivek [1 ]
Bidwell, Shelby [1 ]
Rosen, Benjamin [1 ]
Chan, Agnes [1 ]
Zhou, Shiguo [2 ]
Gentzbittel, Laurent [3 ]
Childs, Kevin L. [4 ]
Yandell, Mark [5 ]
Gundlach, Heidrun [6 ]
Mayer, Klaus F. X. [6 ]
Schwartz, David C. [2 ]
Town, Christopher D. [1 ]
机构
[1] J Craig Venter Inst, Rockville, MD 20850 USA
[2] Univ Wisconsin, Dept Chem, Lab Mol & Computat Genom, Madison, WI 53706 USA
[3] Univ Toulouse, INP ENSAT, CNRS, Lab Ecol Fonct & Environm, Toulouse, France
[4] Michigan State Univ, Dept Plant Biol, E Lansing, MI 48824 USA
[5] Univ Utah, Dept Human Genet, Salt Lake City, UT USA
[6] German Res Ctr Environm Hlth GmbH, Helmholtz Ctr Munich, MIPS IBIS Inst Bioinformat & Syst Biol, Neuherberg, Germany
来源
BMC GENOMICS | 2014年 / 15卷
基金
美国国家科学基金会;
关键词
Medicago; Legume; Genome assembly; Gene annotation; Optical map; RNA-SEQ DATA; RICE GENOME; ANNOTATION; SEQUENCE; ALIGNMENT; GENES; ASSEMBLIES; DISCOVERY; EVOLUTION; PIPELINE;
D O I
10.1186/1471-2164-15-312
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Medicago truncatula, a close relative of alfalfa, is a preeminent model for studying nitrogen fixation, symbiosis, and legume genomics. The Medicago sequencing project began in 2003 with the goal to decipher sequences originated from the euchromatic portion of the genome. The initial sequencing approach was based on a BAC tiling path, culminating in a BAC-based assembly (Mt3.5) as well as an in-depth analysis of the genome published in 2011. Results: Here we describe a further improved and refined version of the M. truncatula genome (Mt4.0) based on de novo whole genome shotgun assembly of a majority of Illumina and 454 reads using ALLPATHS-LG. The ALLPATHS-LG scaffolds were anchored onto the pseudomolecules on the basis of alignments to both the optical map and the genotyping-by-sequencing (GBS) map. The Mt4.0 pseudomolecules encompass similar to 360 Mb of actual sequences spanning 390 Mb of which similar to 330 Mb align perfectly with the optical map, presenting a drastic improvement over the BAC-based Mt3.5 which only contained 70% sequences (similar to 250 Mb) of the current version. Most of the sequences and genes that previously resided on the unanchored portion of Mt3.5 have now been incorporated into the Mt4.0 pseudomolecules, with the exception of similar to 28 Mb of unplaced sequences. With regard to gene annotation, the genome has been re-annotated through our gene prediction pipeline, which integrates EST, RNA-seq, protein and gene prediction evidences. A total of 50,894 genes (31,661 high confidence and 19,233 low confidence) are included in Mt4.0 which overlapped with similar to 82% of the gene loci annotated in Mt3.5. Of the remaining genes, 14% of the Mt3.5 genes have been deprecated to an " unsupported" status and 4% are absent from the Mt4.0 predictions. Conclusions: Mt4.0 and its associated resources, such as genome browsers, BLAST-able datasets and gene information pages, can be found on the JCVI Medicago web site (http://www. jcvi. org/medicago). The assembly and annotation has been deposited in GenBank (BioProject: PRJNA10791). The heavily curated chromosomal sequences and associated gene models of Medicago will serve as a better reference for legume biology and comparative genomics.
引用
收藏
页数:14
相关论文
共 42 条
  • [1] Natural diversity in the model legume Medicago truncatula allows identifying distinct genetic mechanisms conferring partial resistance to Verticillium wilt
    Ben, Cecile
    Toueni, Maoulida
    Montanari, Sara
    Tardin, Marie-Claire
    Fervel, Magalie
    Negahi, Azam
    Saint-Pierre, Laure
    Mathieu, Guillaume
    Gras, Marie-Christine
    Noel, Dominique
    Prosperi, Jean-Marie
    Pilet-Nayel, Marie-Laure
    Baranger, Alain
    Huguet, Thierry
    Julier, Bernadette
    Rickauer, Martina
    Gentzbittel, Laurent
    [J]. JOURNAL OF EXPERIMENTAL BOTANY, 2013, 64 (01) : 317 - 332
  • [2] Nuclear DNA amounts in angiosperms: targets, trends and tomorrow
    Bennett, M. D.
    Leitch, I. J.
    [J]. ANNALS OF BOTANY, 2011, 107 (03) : 467 - 590
  • [3] Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes
    Cannon, Steven B.
    Sterck, Lieven
    Rombauts, Stephane
    Sato, Shusei
    Cheung, Foo
    Gouzy, Jerome
    Wang, Xiaohong
    Mudge, Joann
    Vasdewani, Jayprakash
    Scheix, Thomas
    Spannagl, Manuel
    Monaghan, Erin
    Nicholson, Christine
    Humphray, Sean J.
    Schoof, Heiko
    Mayer, Klaus F. X.
    Rogers, Jane
    Quetier, Francis
    Oldroyd, Giles E.
    Debelle, Frederic
    Cook, Douglas R.
    Retzel, Ernest F.
    Roe, Bruce A.
    Town, Christopher D.
    Tabata, Satoshi
    Van de Peer, Yves
    Young, Nevin D.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (40) : 14959 - 14964
  • [4] MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes
    Cantarel, Brandi L.
    Korf, Ian
    Robb, Sofia M. C.
    Parra, Genis
    Ross, Eric
    Moore, Barry
    Holt, Carson
    Alvarado, Alejandro Sanchez
    Yandell, Mark
    [J]. GENOME RESEARCH, 2008, 18 (01) : 188 - 196
  • [5] A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species
    Elshire, Robert J.
    Glaubitz, Jeffrey C.
    Sun, Qi
    Poland, Jesse A.
    Kawamoto, Ken
    Buckler, Edward S.
    Mitchell, Sharon E.
    [J]. PLOS ONE, 2011, 6 (05):
  • [6] Genome annotation in plants and fungi:: EuGene as a model platform
    Foissac, Sylvain
    Gouzy, Jerome
    Rombauts, Stephane
    Mathe, Catherine
    Amselem, Joelle
    Sterck, Lieven
    Van de Peer, Yves
    Rouze, Pierre
    Schiex, Thomas
    [J]. CURRENT BIOINFORMATICS, 2008, 3 (02) : 87 - 97
  • [7] High-quality draft assemblies of mammalian genomes from massively parallel sequence data
    Gnerre, Sante
    MacCallum, Iain
    Przybylski, Dariusz
    Ribeiro, Filipe J.
    Burton, Joshua N.
    Walker, Bruce J.
    Sharpe, Ted
    Hall, Giles
    Shea, Terrance P.
    Sykes, Sean
    Berlin, Aaron M.
    Aird, Daniel
    Costello, Maura
    Daza, Riza
    Williams, Louise
    Nicol, Robert
    Gnirke, Andreas
    Nusbaum, Chad
    Lander, Eric S.
    Jaffe, David B.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (04) : 1513 - 1518
  • [8] Full-length transcriptome assembly from RNA-Seq data without a reference genome
    Grabherr, Manfred G.
    Haas, Brian J.
    Yassour, Moran
    Levin, Joshua Z.
    Thompson, Dawn A.
    Amit, Ido
    Adiconis, Xian
    Fan, Lin
    Raychowdhury, Raktima
    Zeng, Qiandong
    Chen, Zehua
    Mauceli, Evan
    Hacohen, Nir
    Gnirke, Andreas
    Rhind, Nicholas
    di Palma, Federica
    Birren, Bruce W.
    Nusbaum, Chad
    Lindblad-Toh, Kerstin
    Friedman, Nir
    Regev, Aviv
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (07) : 644 - U130
  • [9] Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
    Haas, BJ
    Delcher, AL
    Mount, SM
    Wortman, JR
    Smith, RK
    Hannick, LI
    Maiti, R
    Ronning, CM
    Rusch, DB
    Town, CD
    Salzberg, SL
    White, O
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (19) : 5654 - 5666
  • [10] Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments
    Haas, Brian J.
    Salzberg, Steven L.
    Zhu, Wei
    Pertea, Mihaela
    Allen, Jonathan E.
    Orvis, Joshua
    White, Owen
    Buell, C. Robin
    Wortman, Jennifer R.
    [J]. GENOME BIOLOGY, 2008, 9 (01)