Mining frequent patterns in protein structures: a study of protease families

被引:20
作者
Chen, Shann-Ching [1 ,2 ]
Bahar, Ivet [1 ]
机构
[1] Univ Pittsburgh, Sch Med, Dept Mol Genet & Biochem, Ctr Computat Biol & Bioinformat, Pittsburgh, PA 15261 USA
[2] Carnegie Mellon Univ, Dept Biomed Engn, Pittsburgh, PA 15213 USA
关键词
D O I
10.1093/bioinformatics/bth912
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Analysis of protein sequence and structure databases usually reveal frequent patterns (FP) associated with biological function. Data mining techniques generally consider the physicochemical and structural properties of amino acids and their microenvironment in the folded structures. Dynamics is not usually considered, although proteins are not static, and their function relates to conformational mobility in many cases. Results: This work describes a novel unsupervised learning approach to discover FPs in the protein families, based on biochemical, geometric and dynamic features. Without any prior knowledge of functional motifs, the method discovers the FPs for each type of amino acid and identifies the conserved residues in three protease subfamilies; chymotrypsin and subtilisin subfamilies of serine proteases and papain subfamily of cysteine proteases. The catalytic triad residues are distinguished by their strong spatial coupling (high interconnectivity) to other conserved residues. Although the spatial arrangements of the catalytic residues in the two subfamilies of serine proteases are similar, their FPs are found to be quite different. The present approach appears to be a promising tool for detecting functional patterns in rapidly growing structure databases and providing insights in to the relationship among protein structure, dynamics and function.
引用
收藏
页码:77 / 85
页数:9
相关论文
共 23 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]  
[Anonymous], 15 C COMP STAT COMPS
[3]  
BAGLEY SC, 1995, PROTEIN SCI, V4, P622
[4]   Vibrational dynamics of folded proteins: Significance of slow and fast motions in relation to function and stability [J].
Bahar, I ;
Atilgan, AR ;
Demirel, MC ;
Erman, B .
PHYSICAL REVIEW LETTERS, 1998, 80 (12) :2733-2736
[5]   Collective motions in HIV-1 reverse transcriptase: Examination of flexibility and enzyme function [J].
Bahar, I ;
Erman, B ;
Jernigan, RL ;
Atilgan, AR ;
Covell, DG .
JOURNAL OF MOLECULAR BIOLOGY, 1999, 285 (03) :1023-1037
[6]   Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential [J].
Bahar, I ;
Atilgan, AR ;
Erman, B .
FOLDING & DESIGN, 1997, 2 (03) :173-181
[7]   THE ENZYME DATA-BANK [J].
BAIROCH, A .
NUCLEIC ACIDS RESEARCH, 1993, 21 (13) :3155-3156
[8]   Analysis of catalytic residues in enzyme active sites [J].
Bartlett, GJ ;
Porter, CT ;
Borkakoti, N ;
Thornton, JM .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 324 (01) :105-121
[9]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[10]   TRILOGY: Discovery of sequence-structure patterns across diverse proteins [J].
Bradley, P ;
Kim, PS ;
Berger, B .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (13) :8500-8505