Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology

被引:171
作者
Cheung, Foo
Haas, Brian J.
Goldberg, Susanne M. D.
May, Gregory D.
Xiao, Yongli
Town, Christopher D.
机构
[1] Inst Genom Res, Rockville, MD 20850 USA
[2] J Craig Venter Inst, Rockville, MD 20850 USA
[3] Natl Ctr Genome Resources, Santa Fe, NM 87508 USA
关键词
D O I
10.1186/1471-2164-7-272
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: In this study, we addressed whether a single 454 Life Science GS20 sequencing run provides new gene discovery from a normalized cDNA library, and whether the short reads produced via this technology are of value in gene structure annotation. Results: A single 454 GS20 sequencing run on adapter-ligated cDNA, from a normalized cDNA library, generated 292,465 reads that were reduced to 252,384 reads with an average read length of 92 nucleotides after cleaning. After clustering and assembly, a total of 184,599 unique sequences were generated containing over 400 SSRs. The 454 sequences generated hits to more genes than a comparable amount of sequence from MtGI. Although short, the 454 reads are of sufficient length to map to a unique genome location as effectively as longer ESTs produced by conventional sequencing. Functional interpretation of the sequences was carried out by Gene Ontology assignments from matches to Arabidopsis and was shown to cover a broad range of GO categories. 53,796 assemblies and singletons (29%) had no match in the existing MtGI. Within the previously unobserved Medicago transcripts, thousands had matches in a comprehensive protein database and one or more of the TIGR Plant Gene Indices. Approximately 20% of these novel sequences could be found in the Medicago genome sequence. A total of 70,026 reads generated by the 454 technology were mapped to 785 Medicago finished BACs using PASA and over 1,000 gene models required modification. In parallel to 454 sequencing, 4,445 5'-prime reads were generated by conventional sequencing using the same library and from the assembled sequences it was shown to contain about 52% full length cDNAs encoding proteins from 50 to over 500 amino acids in length. Conclusion: Due to the large number of reads afforded by the 454 DNA sequencing technology, it is effective in revealing the expression of transcripts from a broad range of GO categories and contains many rare transcripts in normalized cDNA libraries, although only a limited portion of their sequence is uncovered. As with longer ESTs, 454 reads can be mapped uniquely onto genomic sequence to provide support for, and modifications of, gene predictions.
引用
收藏
页数:10
相关论文
共 18 条
  • [1] Bourdon V, 2002, CANCER RES, V62, P6218
  • [2] Large-scale statistical analyses of rice ESTs reveal correlated patterns of gene expression
    Ewing, RM
    Ben Kahla, A
    Poirot, O
    Lopez, F
    Audic, S
    Claverie, JM
    [J]. GENOME RESEARCH, 1999, 9 (10) : 950 - 959
  • [3] Molecular markers from the transcribed/expressed region of the genome in higher plants
    Gupta, P. K.
    Rustgi, S.
    [J]. FUNCTIONAL & INTEGRATIVE GENOMICS, 2004, 4 (03) : 139 - 162
  • [4] Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies
    Haas, BJ
    Delcher, AL
    Mount, SM
    Wortman, JR
    Smith, RK
    Hannick, LI
    Maiti, R
    Ronning, CM
    Rusch, DB
    Town, CD
    Salzberg, SL
    White, O
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (19) : 5654 - 5666
  • [5] The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes
    Lee, Y
    Tsai, J
    Sunkara, S
    Karamycheva, S
    Pertea, G
    Sultana, R
    Antonescu, V
    Chan, A
    Cheung, F
    Quackenbush, J
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D71 - D74
  • [6] Genome sequencing in microfabricated high-density picolitre reactors
    Margulies, M
    Egholm, M
    Altman, WE
    Attiya, S
    Bader, JS
    Bemben, LA
    Berka, J
    Braverman, MS
    Chen, YJ
    Chen, ZT
    Dewell, SB
    Du, L
    Fierro, JM
    Gomes, XV
    Godwin, BC
    He, W
    Helgesen, S
    Ho, CH
    Irzyk, GP
    Jando, SC
    Alenquer, MLI
    Jarvie, TP
    Jirage, KB
    Kim, JB
    Knight, JR
    Lanza, JR
    Leamon, JH
    Lefkowitz, SM
    Lei, M
    Li, J
    Lohman, KL
    Lu, H
    Makhijani, VB
    McDade, KE
    McKenna, MP
    Myers, EW
    Nickerson, E
    Nobile, JR
    Plant, R
    Puc, BP
    Ronan, MT
    Roth, GT
    Sarkis, GJ
    Simons, JF
    Simpson, JW
    Srinivasan, M
    Tartaro, KR
    Tomasz, A
    Vogt, KA
    Volkmer, GA
    [J]. NATURE, 2005, 437 (7057) : 376 - 380
  • [7] Use of tall fescue EST-SSR markers in phylogenetic analysis of cool-season forage grasses
    Mian, MAR
    Saha, MC
    Hopkins, AA
    Wang, ZY
    [J]. GENOME, 2005, 48 (04) : 637 - 647
  • [8] Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana:: Implication for land plant evolution
    Nishiyama, T
    Fujita, T
    Shin-I, T
    Seki, M
    Nishide, H
    Uchiyama, I
    Kamiya, A
    Carninci, P
    Hayashizaki, Y
    Shinozaki, K
    Kohara, Y
    Hasebe, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2003, 100 (13) : 8007 - 8012
  • [9] Correlated clustering and virtual display of gene expression patterns in the wheat life cycle by large-scale statistical analyses of expressed sequence tags
    Ogihara, Y
    Mochida, K
    Nemoto, Y
    Murai, K
    Yamazaki, Y
    Shin-I, T
    Kohara, Y
    [J]. PLANT JOURNAL, 2003, 33 (06) : 1001 - 1011
  • [10] The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants
    Ouyang, S
    Buell, CR
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D360 - D363