Transmembrane protein topology prediction using support vector machines

被引:316
作者
Nugent, Timothy [1 ]
Jones, David T. [1 ]
机构
[1] UCL, Dept Comp Sci, Bioinformat Grp, London WC1E 6BT, England
来源
BMC BIOINFORMATICS | 2009年 / 10卷
基金
英国生物技术与生命科学研究理事会;
关键词
SECONDARY STRUCTURE; MEMBRANE-PROTEINS; SIGNAL PEPTIDES; WEB SERVER; CLASSIFICATION; INFORMATION; SEQUENCE; DATABASE; MODEL; CLN3;
D O I
10.1186/1471-2105-10-159
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated. Results: We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from http://bioinf.cs.ucl.ac.uk/psipred/. Conclusion: The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins.
引用
收藏
页数:11
相关论文
共 54 条
  • [1] Abe S., 2003, International Conference on Computational Intelligence for Modelling Control and Automation (CIMCA), P385
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] PONGO: a web server for multiple predictions of all-alpha transmembrane proteins
    Amico, Mauro
    Finelli, Michele
    Rossi, Ivan
    Zauli, Andrea
    Elofsson, Arne
    Viklund, Hakan
    von Heijne, Gunnar
    Jones, David
    Krogh, Anders
    Fariselli, Piero
    Martelli, Pier Luigi
    Casadio, Rita
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W169 - W172
  • [4] Algorithms for incorporating prior topological information in HMMs: application to transmembrane proteins
    Bagos, Pantelis G.
    Liakopoulos, Theodore D.
    Hamodrakas, Stavros J.
    [J]. BMC BIOINFORMATICS, 2006, 7 (1)
  • [5] PRED-TMBB:: a web server for predicting the topology of β-barrel outer membrane proteins
    Bagos, PG
    Liakopoulos, TD
    Spyropoulos, IC
    Hamodrakas, SJ
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : W400 - W404
  • [6] Improved prediction of signal peptides: SignalP 3.0
    Bendtsen, JD
    Nielsen, H
    von Heijne, G
    Brunak, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2004, 340 (04) : 783 - 795
  • [7] Benson DA, 2017, NUCLEIC ACIDS RES, V45, pD37, DOI [10.1093/nar/gkl986, 10.1093/nar/gkw1070, 10.1093/nar/gkg057, 10.1093/nar/gks1195, 10.1093/nar/gkp1024, 10.1093/nar/gkq1079, 10.1093/nar/gkr1202, 10.1093/nar/gkx1094, 10.1093/nar/gkn723]
  • [8] Announcing the worldwide Protein Data Bank
    Berman, H
    Henrick, K
    Nakamura, H
    [J]. NATURE STRUCTURAL BIOLOGY, 2003, 10 (12) : 980 - 980
  • [9] The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
    Boeckmann, B
    Bairoch, A
    Apweiler, R
    Blatter, MC
    Estreicher, A
    Gasteiger, E
    Martin, MJ
    Michoud, K
    O'Donovan, C
    Phan, I
    Pilbout, S
    Schneider, M
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (01) : 365 - 370
  • [10] Transmembrane helix predictions revisited
    Chen, CP
    Kernytsky, A
    Rost, B
    [J]. PROTEIN SCIENCE, 2002, 11 (12) : 2774 - 2791