Graphlet Kernels for Prediction of Functional Residues in Protein Structures

被引:41
作者
Vacic, Vladimir [2 ]
Iakoucheva, Lilia M. [3 ]
Lonardi, Stefano [2 ]
Radivojac, Predrag [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47408 USA
[2] Univ Calif Riverside, Dept Comp Sci & Engn, Riverside, CA 92521 USA
[3] Rockefeller Univ, Lab Stat Genet, New York, NY 10021 USA
关键词
algorithms; graphs; kernel methods; machine learning; protein structure; protein function; 3D COORDINATE TEMPLATES; TO-ORDER TRANSITION; PHOSPHORYLATION SITES; PHOSPHOPROTEOMIC ANALYSIS; ENERGY LANDSCAPE; STRING KERNELS; BINDING-SITES; SEQUENCE; IDENTIFICATION; DETERMINANTS;
D O I
10.1089/cmb.2009.0029
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non-isomorphic subgraphs (graphlets), centered on the vertex of interest. A similarity measure between two vertices is expressed as the inner product of their respective count vectors and is used in a supervised learning framework to classify protein residues. We evaluated our method on two function prediction problems: identification of catalytic residues in proteins, which is a well-studied problem suitable for benchmarking, and a much less explored problem of predicting phosphorylation sites in protein structures. The performance of the graphlet kernel approach was then compared against two alternative methods, a sequence-based predictor and our implementation of the FEATURE framework. On both tasks, the graphlet kernel performed favorably; however, the margin of difference was considerably higher on the problem of phosphorylation site prediction. While there is data that phosphorylation sites are preferentially positioned in intrinsically disordered regions, we provide evidence that for the sites that are located in structured regions, neither the surface accessibility alone nor the averaged measures calculated from the residue microenvironments utilized by FEATURE were sufficient to achieve high accuracy. The key benefit of the graphlet representation is its ability to capture neighborhood similarities in protein structures via enumerating the patterns of local connectivity in the corresponding labeled graphs.
引用
收藏
页码:55 / 72
页数:18
相关论文
共 101 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]  
[Anonymous], 2002, Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
[3]   A GRAPH-THEORETIC APPROACH TO THE IDENTIFICATION OF 3-DIMENSIONAL PATTERNS OF AMINO-ACID SIDE-CHAINS IN PROTEIN STRUCTURES [J].
ARTYMIUK, PJ ;
POIRRETTE, AR ;
GRINDLEY, HM ;
RICE, DW ;
WILLETT, P .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 243 (02) :327-344
[4]   Small-world communication of residues and significance for protein dynamics [J].
Atilgan, AR ;
Akan, P ;
Baysal, C .
BIOPHYSICAL JOURNAL, 2004, 86 (01) :85-91
[5]  
BAGLEY SC, 1995, PROTEIN SCI, V4, P622
[6]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[7]   Phosphoproteomic analysis of the developing mouse brain [J].
Ballif, BA ;
Villén, J ;
Beausoleil, SA ;
Schwartz, D ;
Gygi, SP .
MOLECULAR & CELLULAR PROTEOMICS, 2004, 3 (11) :1093-1101
[8]   Structure-based function inference using protein family-specific fingerprints [J].
Bandyopadhyay, Deepak ;
Huan, Jun ;
Liu, Jinze ;
Prins, Jan ;
Snoeyink, Jack ;
Wang, Wei ;
Tropsha, Alexander .
PROTEIN SCIENCE, 2006, 15 (06) :1537-1543
[9]   Large-scale characterization of HeLa cell nuclear phosphoproteins [J].
Beausoleil, SA ;
Jedrychowski, M ;
Schwartz, D ;
Elias, JE ;
Villén, J ;
Li, JX ;
Cohn, MA ;
Cantley, LC ;
Gygi, SP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (33) :12130-12135
[10]   The Protein Data Bank and the challenge of structural genomics [J].
Berman, HM ;
Bhat, TN ;
Bourne, PE ;
Feng, ZK ;
Gilliland, G ;
Weissig, H ;
Westbrook, J .
NATURE STRUCTURAL BIOLOGY, 2000, 7 (Suppl 11) :957-959