The Ensembl automatic gene annotation system

被引:269
作者
Curwen, V
Eyras, E
Andrews, TD
Clarke, L
Mongin, E
Searle, SMJ
Clamp, M
机构
[1] Wellcome Trust Sanger Inst, Cambridge, England
[2] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[3] Broad Inst, Cambridge, MA 02141 USA
关键词
D O I
10.1101/gr.1858004
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
As more genomes are sequenced, there is an increasing need for automated first-pass annotation which allows timely access to important genomic information. The Ensembl gene-building system enables fast automated annotation of eukaryotic genomes. It annotates genes based on evidence derived from known protein, cDNA, and EST sequences. The gene-building system rests on top of the core Ensembl (MySQL) database schema and Perl Application Programming Interface (API), and the data generated are accessible through the Ensembl genome browser (http://www.ensembl.org). To date, the Ensembl predicted gene sets are available for the A. gambiae, C briggsae, zebrafish, mouse, rat, and human genomes and have been heavily relied upon in the publication of the human, mouse, rat, and A. gambiae genome sequence analysis. Here we describe in detail the gene-building system and the algorithms involved. All code and data are freely available from http://www.ensembl.org.
引用
收藏
页码:942 / 950
页数:9
相关论文
共 28 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[4]   DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS [J].
BOGUSKI, MS ;
LOWE, TMJ ;
TOLSTOSHEV, CM .
NATURE GENETICS, 1993, 4 (04) :332-333
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   Evaluation of gene structure prediction programs [J].
Burset, M ;
Guigo, R .
GENOMICS, 1996, 34 (03) :353-367
[7]   The DNA sequence and comparative analysis of human chromosome 20 [J].
Deloukas, P ;
Matthews, LH ;
Ashurst, J ;
Burton, J ;
Gilbert, JGR ;
Jones, M ;
Stavrides, G ;
Almeida, JP ;
Babbage, AK ;
Bagguley, CL ;
Bailey, J ;
Barlow, KF ;
Bates, KN ;
Beard, LM ;
Beare, DM ;
Beasley, OP ;
Bird, CP ;
Blakey, SE ;
Bridgeman, AM ;
Brown, AJ ;
Buck, D ;
Burrill, W ;
Butler, AP ;
Carder, C ;
Carter, NP ;
Chapman, JC ;
Clamp, M ;
Clark, G ;
Clark, LN ;
Clark, SY ;
Clee, CM ;
Clegg, S ;
Cobley, VE ;
Collier, RE ;
Connor, R ;
Corby, NR ;
Coulson, A ;
Coville, GJ ;
Deadman, R ;
Dhami, P ;
Dunn, M ;
Ellington, AG ;
Frankland, JA ;
Fraser, A ;
French, L ;
Garner, P ;
Grafham, DV ;
Griffiths, C ;
Griffiths, ND ;
Gwilliam, R .
NATURE, 2001, 414 (6866) :865-U3
[8]   Computational detection and location of transcription start sites in mammalian genomic DNA [J].
Down, TA ;
Hubbard, TJP .
GENOME RESEARCH, 2002, 12 (03) :458-461
[9]   An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks [J].
Gaunt, MW ;
Miles, MA .
MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (05) :748-761
[10]   WormBase: a cross-species database for comparative genomics [J].
Harris, TW ;
Lee, R ;
Schwarz, E ;
Bradnam, K ;
Lawson, D ;
Chen, W ;
Blasier, D ;
Kenny, E ;
Cunningham, F ;
Kishore, R ;
Chan, J ;
Muller, HM ;
Petcherski, A ;
Thorisson, G ;
Day, A ;
Bieri, T ;
Rogers, A ;
Chen, CK ;
Spieth, J ;
Sternberg, P ;
Durbin, R ;
Stein, LD .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :133-137