Remote homolog detection using local sequence-structure correlations

被引:30
作者
Hou, YN [1 ]
Hsu, W
Lee, ML
Bystroff, C
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 119260, Singapore
[2] Rensselaer Polytech Inst, Dept Biol, Troy, NY 12180 USA
关键词
remote homology; local structure; support vector machines; hidden Markov model; protein folding; I-sites; HMMSTR;
D O I
10.1002/prot.20221
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Remote homology detection refers to the detection of structural homology in proteins when there is little or no sequence similarity. In this article, we present a remote homolog detection method called SVM-HMMSTR that overcomes the reliance on detectable sequence similarity by transforming the sequences into strings of hidden Markov states that represent local folding motif patterns. These state strings are transformed into fixed-dimension feature vectors for input to a support vector machine. Two sets of features are defined: an order-independent feature set that captures the amino acid and local structure composition; and an order-dependent feature set that captures the sequential ordering of the local structures. Tests using the Structural Classification of Proteins (SCOP) 1.53 data set show that the SVM-HAMSTR gives a significant improvement over several current methods. (C) 2004 Wiley-Liss, Inc.
引用
收藏
页码:518 / 530
页数:13
相关论文
共 28 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] [Anonymous], 1990, METHOD ENZYMOL
  • [4] HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION
    BALDI, P
    CHAUVIN, Y
    HUNKAPILLER, T
    MCCLURE, MA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) : 1059 - 1063
  • [5] BENHUR A, 2003, 11 INT C INT SYST MO
  • [6] Prediction of local structure in proteins using a library of sequence-structure motifs
    Bystroff, C
    Baker, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1998, 281 (03) : 565 - 577
  • [7] HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins
    Bystroff, C
    Thorsson, V
    Baker, D
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2000, 301 (01) : 173 - 190
  • [8] Cristiani N., 2000, INTRO SUPPORT VECTOR
  • [9] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [10] Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching
    Gribskov, M
    Robinson, NL
    [J]. COMPUTERS & CHEMISTRY, 1996, 20 (01): : 25 - 33