Computational prediction and analysis of protein γ-carboxylation sites based on a random forest method

被引:42
作者
Zhang, Ning [1 ]
Li, Bi-Qing [2 ]
Gao, Shan [3 ]
Ruan, Ji-Shou [3 ]
Cai, Yu-Dong [4 ]
机构
[1] Tianjin Univ, Tianjin Key Lab BME Measurement, Dept Biomed Engn, Tianjin 300072, Peoples R China
[2] Chinese Acad Sci, Shanghai Inst Biol Sci, Key Lab Syst Biol, Shanghai 200031, Peoples R China
[3] Nankai Univ, Coll Math Sci, Tianjin 300071, Peoples R China
[4] Shanghai Univ, Inst Syst Biol, Shanghai 200444, Peoples R China
关键词
AMINO-ACID-COMPOSITION; HEXAPEPTIDE DISULFIDE LOOP; CARBOXYGLUTAMIC-ACID; SUBCELLULAR LOCATION; POSTTRANSLATIONAL MODIFICATION; GLUTAMYL CARBOXYLATION; CONTAINING CONTRYPHAN; APOPTOSIS PROTEINS; RECOGNITION SITE; S-NITROSYLATION;
D O I
10.1039/c2mb25185j
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The glutamate gamma-carboxylation plays a pivotal part in a number of important human diseases. However, traditional protein gamma-carboxylation site detection by experimental approaches are often laborious and time-consuming. In this study, we initiated an attempt for the computational prediction of protein gamma-carboxylation sites. We developed a new method for predicting the gamma-carboxylation sites based on a Random Forest method. As a result, 90.44% accuracy and 0.7739 MCC value were obtained for the training dataset, and 89.83% accuracy and 0.7448 MCC value for the testing dataset. Our method considered several features including sequence conservation, residual disorder, secondary structures, solvent accessibility, physicochemical/biochemical properties and amino acid occurrence frequencies. By means of the feature selection algorithm, an optimal set of 327 features were selected; these features were considered as the ones that contributed significantly to the prediction of protein gamma-carboxylation sites. Analysis of the optimal feature set indicated several important factors in determining the gamma-carboxylation and a possible consensus sequence of the gamma-carboxylation recognition site (gamma-CRS) was suggested. These may shed some light on the in-depth understanding of the mechanisms of gamma-carboxylation, providing guidelines for experimental validation.
引用
收藏
页码:2946 / 2955
页数:10
相关论文
共 81 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148
[3]   Solving the protein sequence metric problem [J].
Atchley, WR ;
Zhao, JP ;
Fernandes, AD ;
Drüke, T .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (18) :6395-6400
[4]   γ-glutamyl carboxylation:: An extracellular posttranslational modification that antedates the divergence of molluscs, arthropods, and chordates [J].
Bandyopadhyay, PK ;
Garrett, JE ;
Shetty, RP ;
Keate, T ;
Walker, CS ;
Olivera, BM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (03) :1264-1269
[5]   AMS 3.0: prediction of post-translational modifications [J].
Basu, Subhadip ;
Plewczynski, Dariusz .
BMC BIOINFORMATICS, 2010, 11
[6]   Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence [J].
Blom, N ;
Sicheritz-Pontén, T ;
Gupta, R ;
Gammeltoft, S ;
Brunak, S .
PROTEOMICS, 2004, 4 (06) :1633-1649
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]   Precursors of novel Gla-containing conotoxins contain a carboxy-terminal recognition site that directs γ-carboxylation [J].
Brown, MA ;
Begley, GS ;
Czerwiec, E ;
Stenberg, LM ;
Jacobs, M ;
Kalume, DE ;
Roepstorff, P ;
Stenflo, J ;
Furie, BC ;
Furie, B .
BIOCHEMISTRY, 2005, 44 (25) :9150-9159
[9]   Detection of vitamin K-dependent proteins in venoms with a monoclonal antibody specific for γ-carboxyglutamic acid [J].
Brown, MA ;
Hambe, B ;
Furie, B ;
Furie, BC ;
Stenflo, J ;
Stenberg, LM .
TOXICON, 2002, 40 (04) :447-453
[10]   Conotoxins and the posttranslational modification of secreted gene products [J].
Buczek, O ;
Bulaj, G ;
Olivera, BM .
CELLULAR AND MOLECULAR LIFE SCIENCES, 2005, 62 (24) :3067-3079