Annotated Expressed Sequence Tags (ESTs) from pre-smolt Atlantic salmon (Salmo salar) in a searchable data resource

被引:46
作者
Adzhubei, Alexei A.
Vlasova, Anna V.
Hagen-Larsen, Heidi
Ruden, Torgeir A.
Laerdahl, Jon K.
Hoyheim, Bjorn
机构
[1] Norwegian Sch Vet Sci, NO-0033 Oslo, Norway
[2] Univ Oslo, Ctr Biotechnol, NO-0317 Oslo, Norway
[3] Natl Hosp Norway, Inst Med Microbiol, Radiumhosp Med Ctr, NO-0027 Oslo, Norway
[4] Natl Hosp Norway, CMBN, Radiumhosp Med Ctr, NO-0027 Oslo, Norway
[5] RAS, VA Engelhardt Mol Biol Inst, Moscow 117901, Russia
来源
BMC GENOMICS | 2007年 / 8卷
关键词
SWISS-MODEL; LINKAGE MAP; MICROSATELLITE; GENES; TOOL;
D O I
10.1186/1471-2164-8-209
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: To identify as many different transcripts/genes in the Atlantic salmon genome as possible, it is crucial to acquire good cDNA libraries from different tissues and developmental stages, their relevant sequences (ESTs or full length sequences) and attempt to predict function. Such libraries allow identification of a large number of different transcripts and can provide valuable information on genes expressed in a particular tissue at a specific developmental stage. This data is important in constructing a microarray chip, identifying SNPs in coding regions, and for future identification of genes in the whole genome sequence. An important factor that determines the usefulness of generated data for biologists is efficient data access. Public searchable databases play a crucial role in providing such service. Description: Twenty-three Atlantic salmon cDNA libraries were constructed from 15 tissues, yielding nearly 155,000 clones. From these libraries 58,109 ESTs were generated, of which 57,212 were used for contig assembly. Following deletion of mitochondrial sequences 55,118 EST sequences were submitted to GenBank. In all, 20,019 unique sequences, consisting of 6,424 contigs and 13,595 singlets, were generated. The Norwegian Salmon Genome Project Database has been constructed and annotation performed by the annotation transfer approach. Annotation was successful for 50.3% (10,075) of the sequences and 6,113 sequences (30.5%) were annotated with Gene Ontology terms for molecular function, biological process and cellular component. Conclusion: We describe the construction of cDNA libraries from juvenile/pre-smolt Atlantic salmon (Salmo salar), EST sequencing, clustering, and annotation by assigning putative function to the transcripts. These sequences represents 97% of all sequences submitted to GenBank from the pre-smoltification stage. The data has been grouped into datasets according to its source and type of annotation. Various data query options are offered including searches on function assignments and Gene Ontology terms. Data delivery options include summaries for the datasets and their annotations, detailed self-explanatory annotations, and access to the original BLAST results and Gene Ontology annotation trees. Potential presence of a relatively high number of immune-related genes in the dataset was shown by annotation searches.
引用
收藏
页数:15
相关论文
共 27 条
  • [1] preAssemble: a tool for automatic sequencer trace data processing
    Adzhubei, AA
    Laerdahl, JK
    Vlasova, AV
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] [Anonymous], EVOLUTIONARY GENETIC
  • [4] DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS
    BOGUSKI, MS
    LOWE, TMJ
    TOLSTOSHEV, CM
    [J]. NATURE GENETICS, 1993, 4 (04) : 332 - 333
  • [5] Bonfield J., 1995, Staden Package
  • [6] Cairney M, 2000, MOL ECOL, V9, P2175, DOI 10.1046/j.1365-294X.2000.105312.x
  • [7] The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
    Camon, E
    Magrane, M
    Barrell, D
    Lee, V
    Dimmer, E
    Maslen, J
    Binns, D
    Harte, N
    Lopez, R
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : D262 - D266
  • [8] Base-calling of automated sequencer traces using phred.: II.: Error probabilities
    Ewing, B
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 186 - 194
  • [9] Base-calling of automated sequencer traces using phred.: I.: Accuracy assessment
    Ewing, B
    Hillier, L
    Wendl, MC
    Green, P
    [J]. GENOME RESEARCH, 1998, 8 (03): : 175 - 185
  • [10] A microsatellite linkage map for Atlantic salmon (Salmo salar)
    Gilbey, J
    Verspoor, E
    McLay, A
    Houlihan, D
    [J]. ANIMAL GENETICS, 2004, 35 (02) : 98 - 105