Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics

被引:5
作者
Xie, J [1 ]
Kim, NK [1 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
关键词
amino acid side-chain polarity; Bayesian models; Kullback-Leibler information; MCMC; protein motifs;
D O I
10.1089/cmb.2005.12.952
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low ( e. g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix - turn - helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.
引用
收藏
页码:952 / 970
页数:19
相关论文
共 15 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   PROSITE - A DICTIONARY OF SITES AND PATTERNS IN PROTEINS [J].
BAIROCH, A .
NUCLEIC ACIDS RESEARCH, 1991, 19 :2241-2245
[3]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[4]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763
[5]   Probabilistic alignment of motifs with sequences [J].
Gonnet, P ;
Lisacek, F .
BIOINFORMATICS, 2002, 18 (08) :1091-1101
[6]   Reversible jump Markov chain Monte Carlo computation and Bayesian model determination [J].
Green, PJ .
BIOMETRIKA, 1995, 82 (04) :711-732
[7]   AMINO-ACID SUBSTITUTION MATRICES FROM PROTEIN BLOCKS [J].
HENIKOFF, S ;
HENIKOFF, JG .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1992, 89 (22) :10915-10919
[8]  
HENIKOFF S, 1995, GENE, V163, pGC17, DOI 10.1016/0378-1119(95)00486-P
[9]   DETECTING SUBTLE SEQUENCE SIGNALS - A GIBBS SAMPLING STRATEGY FOR MULTIPLE ALIGNMENT [J].
LAWRENCE, CE ;
ALTSCHUL, SF ;
BOGUSKI, MS ;
LIU, JS ;
NEUWALD, AF ;
WOOTTON, JC .
SCIENCE, 1993, 262 (5131) :208-214
[10]   Bayesian models for multiple local sequence alignment and Gibbs sampling strategies [J].
Liu, JS ;
Neuwald, AF ;
Lawrence, CE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (432) :1156-1170