The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes

被引:145
作者
Lee, Y [1 ]
Tsai, J [1 ]
Sunkara, S [1 ]
Karamycheva, S [1 ]
Pertea, G [1 ]
Sultana, R [1 ]
Antonescu, V [1 ]
Chan, A [1 ]
Cheung, F [1 ]
Quackenbush, J [1 ]
机构
[1] Inst Genom Res, Rockville, MD 20850 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/nar/gki064
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis.
引用
收藏
页码:D71 / D74
页数:4
相关论文
共 11 条
  • [1] The Distributed Annotation System
    Dowell, Robin D.
    Jokerst, Rodney M.
    Day, Allen
    Eddy, Sean R.
    Stein, Lincoln
    [J]. BMC BIOINFORMATICS, 2001, 2 (1)
  • [2] CAP3: A DNA sequence assembly program
    Huang, XQ
    Madan, A
    [J]. GENOME RESEARCH, 1999, 9 (09) : 868 - 877
  • [3] Iseli C, 1999, Proc Int Conf Intell Syst Mol Biol, P138
  • [4] Cross-referencing eukaryotic genomes: TIGR orthologous gene alignments (TOGA)
    Lee, Y
    Sultana, R
    Pertea, G
    Cho, J
    Karamycheva, S
    Tsai, J
    Parvizi, B
    Cheung, F
    Antonescu, V
    White, J
    Holt, I
    Liang, F
    Quackenbush, J
    [J]. GENOME RESEARCH, 2002, 12 (03) : 493 - 502
  • [5] An optimized protocol for analysis of EST sequences
    Liang, F
    Holt, I
    Pertea, G
    Karamycheva, S
    Salzberg, SL
    Quackenbush, J
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (18) : 3657 - 3665
  • [6] TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets
    Pertea, G
    Huang, XQ
    Liang, F
    Antonescu, V
    Sultana, R
    Karamycheva, S
    Lee, Y
    White, J
    Cheung, F
    Parvizi, B
    Tsai, J
    Quackenbush, J
    [J]. BIOINFORMATICS, 2003, 19 (05) : 651 - 652
  • [7] The TIGR Gene Indices: reconstruction and representation of expressed gene sequences
    Quackenbush, J
    Liang, F
    Holt, I
    Pertea, G
    Upton, J
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 141 - 145
  • [8] The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species
    Quackenbush, J
    Cho, J
    Lee, D
    Liang, F
    Holt, I
    Karamycheva, S
    Parvizi, B
    Pertea, G
    Sultana, R
    White, J
    [J]. NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 159 - 164
  • [9] Tsai J, 2001, GENOME BIOL, V2
  • [10] Selection of oligonucleotide probes for protein coding sequences
    Wang, XW
    Seed, B
    [J]. BIOINFORMATICS, 2003, 19 (07) : 796 - 802