Combining pairwise-sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships

被引:226
作者
Liao, L
Noble, WS
机构
[1] Univ Delaware, Dept Comp & Informat Sci, Newark, DE 19716 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
关键词
pairwise sequence comparison; homology; detection; support vector machines;
D O I
10.1089/106652703322756113
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of representing proteins using pairwise sequence similarity scores. This representation, combined with a discriminative classification algorithm known as the support vector machine (SVM), provides a powerful means of detecting subtle structural and evolutionary relationships among proteins. The algorithm, called SVM-pairwise, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better performance than SVM-Fisher, profile HMMs, and PSI-BLAST.
引用
收藏
页码:857 / 868
页数:12
相关论文
共 28 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [3] [Anonymous], 2002, Proc. of the Intl. Conf. on Research in Computational Molecular Biology
  • [4] [Anonymous], 1990, METHOD ENZYMOL
  • [5] HIDDEN MARKOV-MODELS OF BIOLOGICAL PRIMARY SEQUENCE INFORMATION
    BALDI, P
    CHAUVIN, Y
    HUNKAPILLER, T
    MCCLURE, MA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1994, 91 (03) : 1059 - 1063
  • [6] Bishop C. M., 1995, NEURAL NETWORKS PATT
  • [7] The ASTRAL compendium for protein structure and sequence analysis
    Brenner, SE
    Koehl, P
    Levitt, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 254 - 256
  • [8] Knowledge-based analysis of microarray gene expression data by using support vector machines
    Brown, MPS
    Grundy, WN
    Lin, D
    Cristianini, N
    Sugnet, CW
    Furey, TS
    Ares, M
    Haussler, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) : 262 - 267
  • [9] Cristianini N, 2000, Intelligent Data Analysis: An Introduction
  • [10] GRIBSKOV M, 1990, METHOD ENZYMOL, V183, P146