The universal protein resource (UniProt)

被引:1187
作者
Bairoch, A
Apweiler, R
Wu, CH
Barker, WC
Boeckmann, B
Ferro, S
Gasteiger, E
Huang, HZ
Lopez, R
Magrane, M
Martin, MJ
Natale, DA
O'Donovan, C
Redaschi, N
Yeh, LSL
机构
[1] European Bioinformat Inst, EMBL Outstn, Cambridge CB10 1SD, England
[2] Ctr Med Univ Geneva, Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland
[3] Georgetown Univ, Med Ctr, Natl Biomed Res Fdn, Washington, DC 20057 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/nar/gki070
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data.
引用
收藏
页码:D154 / D159
页数:6
相关论文
共 30 条
[1]  
Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkw1099, 10.1093/nar/gkh131]
[2]   Protein sequence databases [J].
Apweiler, R ;
Bairoch, A ;
Wu, CH .
CURRENT OPINION IN CHEMICAL BIOLOGY, 2004, 8 (01) :76-80
[3]   PRINTS and its automatic supplement, prePRINTS [J].
Attwood, TK ;
Bradley, P ;
Flower, DR ;
Gaulton, A ;
Maudling, N ;
Mitchell, AL ;
Moulton, G ;
Nordle, A ;
Paine, K ;
Taylor, P ;
Uddin, A ;
Zygouri, C .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :400-402
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[5]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[6]   The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Lee, V ;
Dimmer, E ;
Maslen, J ;
Binns, D ;
Harte, N ;
Lopez, R ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D262-D266
[7]   A novel method for automatic functional annotation of proteins [J].
Fleischmann, W ;
Möller, S ;
Gateau, A ;
Apweiler, R .
BIOINFORMATICS, 1999, 15 (03) :228-233
[8]   Automated annotation of microbial proteomes in SWISS-PROT [J].
Gattiker, A ;
Michoud, K ;
Rivoire, C ;
Auchincloss, AH ;
Coudert, E ;
Lima, T ;
Kersey, P ;
Pagni, M ;
Sigrist, CJA ;
Lachaize, C ;
Veuthey, AL ;
Gasteiger, E ;
Bairoch, A .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2003, 27 (01) :49-58
[9]   The FlyBase database of the Drosophila genome projects and community literature [J].
Gelbart, W ;
Bayraktaroglu, L ;
Bettencourt, B ;
Campbell, K ;
Crosby, M ;
Emmert, D ;
Hradecky, P ;
Huang, Y ;
Letovsky, S ;
Matthews, B ;
Russo, S ;
Schroeder, A ;
Smutniak, F ;
Zhou, P ;
Zytkovicz, M ;
Ashburner, M ;
Drysdale, R ;
de Grey, A ;
Foulger, R ;
Millburn, G ;
Yamada, C ;
Kaufman, T ;
Matthews, K ;
Gilbert, D ;
Grumbling, G ;
Strelets, V ;
Shemen, C ;
Rubin, G ;
Berman, B ;
Frise, E ;
Gibson, M ;
Harris, N ;
Kaminker, J ;
Lewis, S ;
Marshall, B ;
Misra, S ;
Mungall, C ;
Prochnik, S ;
Richter, J ;
Smith, C ;
Shu, S ;
Tupy, J ;
Wiel, C .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :172-175
[10]   Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure [J].
Gough, J ;
Karplus, K ;
Hughey, R ;
Chothia, C .
JOURNAL OF MOLECULAR BIOLOGY, 2001, 313 (04) :903-919