Variable gap penalty for protein sequence-structure alignment

被引:40
作者
Madhusudhan, MS
Marti-Renom, MA
Sanchez, R
Sali, A [1 ]
机构
[1] Univ Calif San Francisco, Dept Biopharmaceut Sci, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Dept Pharmaceut Chem, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Calif Inst Quantitat Biomed Res, San Francisco, CA 94143 USA
[4] CUNY Mt Sinai Sch Med, Struct Biol Program, New York, NY 10029 USA
关键词
comparative protein structure modeling; gap penalty function; homology modeling; sequence-structure alignment;
D O I
10.1093/protein/gzj005
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The penalty for inserting gaps into an alignment between two protein sequences is a major determinant of the alignment accuracy. Here, we present an algorithm for finding a globally optimal alignment by dynamic programming that can use a variable gap penalty (VGP) function of any form. We also describe a specific function that depends on the structural context of an insertion or deletion. It penalizes gaps that are introduced within regions of regular secondary structure, buried regions, straight segments and also between two spatially distant residues. The parameters of the penalty function were optimized on a set of 240 sequence pairs of known structure, spanning the sequence identity range of 20-40%. We then tested the algorithm on another set of 238 sequence pairs of known structures. The use of the VGP function increases the number of correctly aligned residues from 81.0 to 84.5% in comparison with the optimized affine gap penalty function; this difference is statistically significant according to Student's t-test. We estimate that the new algorithm allows us to produce comparative models with an additional similar to 7 million accurately modeled residues in the similar to 1.1 million proteins that are detectably related to a known structure.
引用
收藏
页码:129 / 133
页数:5
相关论文
共 28 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] [Anonymous], 1978, ATLAS PROTEIN SEQ ST
  • [3] [Anonymous], 1987, FDN STAT
  • [4] The universal protein resource (UniProt)
    Bairoch, A
    Apweiler, R
    Wu, CH
    Barker, WC
    Boeckmann, B
    Ferro, S
    Gasteiger, E
    Huang, HZ
    Lopez, R
    Magrane, M
    Martin, MJ
    Natale, DA
    O'Donovan, C
    Redaschi, N
    Yeh, LSL
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D154 - D159
  • [5] EVALUATION AND IMPROVEMENTS IN THE AUTOMATIC ALIGNMENT OF PROTEIN SEQUENCES
    BARTON, GJ
    STERNBERG, MJE
    [J]. PROTEIN ENGINEERING, 1987, 1 (02): : 89 - 94
  • [6] The Protein Data Bank
    Berman, HM
    Battistuz, T
    Bhat, TN
    Bluhm, WF
    Bourne, PE
    Burkhardt, K
    Iype, L
    Jain, S
    Fagan, P
    Marvin, J
    Padilla, D
    Ravichandran, V
    Schneider, B
    Thanki, N
    Weissig, H
    Westbrook, JD
    Zardecki, C
    [J]. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2002, 58 : 899 - 907
  • [7] A sequence alignment algorithm with an arbitrary gap penalty function
    Dewey, TG
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2001, 8 (02) : 177 - 190
  • [8] Durbin R., 1998, Biological sequence analysis: Probabilistic models of proteins and nucleic acids
  • [9] A comparison of scoring functions for protein sequence profile alignment
    Edgar, RC
    Sjölander, K
    [J]. BIOINFORMATICS, 2004, 20 (08) : 1301 - 1308
  • [10] Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function
    Goonesekere, NCW
    Lee, B
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (09) : 2838 - 2843