Data mining the protein data bank:: automatic detection and assignment of carbohydrate structures

被引:87
作者
Lütteke, T [1 ]
Frank, M [1 ]
von der Lieth, CW [1 ]
机构
[1] German Canc Res Ctr, Cent Spectroscop Dept, INF 280, Heidelberg, Germany
关键词
data analysis; 3D structure database; glycosylation; bioinformatics; algorithm;
D O I
10.1016/j.carres.2003.09.038
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge of the 3D structure of glycans is a prerequisite for a complete understanding of the biological processes glycoproteins are involved in. However, due to a lack of standardised nomenclature, carbohydrate compounds are difficult to locate within the Protein Data Bank (PDB). Using an algorithm that detects carbohydrate structures only requiring element types and atom coordinates. we were able to detect 1663 entries containing a total of 5647 carbohydrate chains. The majority of chains are found to be N-glycosidically bound. Noncovalently bound ligands are also frequent, while O-glycans form a minority. About 30% of all carbohydrate containing PDB entries comprise one or several errors. The automatic assignment of carbohydrate structures in PDB entries will improve the cross-linking of glycobiology resources with genomic and proteomic data collections, which will be an important issue of the upcoming glycomics projects. By aiding in detection of erroneous annotations and structures, the algorithm might also help to increase database quality. (C) 2003 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1015 / 1020
页数:6
相关论文
共 18 条
  • [1] On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database
    Apweiler, R
    Hermjakob, H
    Sharon, N
    [J]. BIOCHIMICA ET BIOPHYSICA ACTA-GENERAL SUBJECTS, 1999, 1473 (01): : 4 - 8
  • [2] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [3] LINUCS:: LInear Notation for Unique Description of Carbohydrate Sequences
    Bohne-Lang, A
    Lang, E
    Förster, T
    von der Lieth, CW
    [J]. CARBOHYDRATE RESEARCH, 2001, 336 (01) : 1 - 11
  • [4] Analysis of N-linked oligosaccharides:: progress towards the characterisation of glycoprotein-linked carbohydrates
    Charlwood, J
    Bryant, D
    Skehel, JM
    Camilleri, P
    [J]. BIOMOLECULAR ENGINEERING, 2001, 18 (05): : 229 - 240
  • [5] Intracellular functions of N-linked glycans
    Helenius, A
    Aebi, M
    [J]. SCIENCE, 2001, 291 (5512) : 2364 - 2369
  • [6] Errors in protein structures
    Hooft, RWW
    Vriend, G
    Sander, C
    Abola, EE
    [J]. NATURE, 1996, 381 (6580) : 272 - 272
  • [7] STEREOCHEMISTRY OF THE N-GLYCOSYLATION SITES IN GLYCOPROTEINS
    IMBERTY, A
    PEREZ, S
    [J]. PROTEIN ENGINEERING, 1995, 8 (07): : 699 - 709
  • [8] Jung E, 2001, PROTEOMICS, V1, P262, DOI 10.1002/1615-9861(200102)1:2<262::AID-PROT262>3.3.CO
  • [9] 2-R
  • [10] Solvent interactions determine carbohydrate conformation
    Kirschner, KN
    Woods, RJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (19) : 10541 - 10545