Effect of training datasets on support vector machine prediction of protein-protein interactions

被引:62
|
作者
Lo, SL
Cai, CZ
Chen, YZ
Chung, MCM
机构
[1] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
[2] Natl Univ Singapore, Bioproc Technol Inst, Singapore 117597, Singapore
[3] Natl Univ Singapore, Dept Computat Sci, Singapore 117597, Singapore
[4] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
关键词
database of interacting proteins; protein function prediction; protein-protein interaction; shuffled sequence; support vector machine; SVMlight;
D O I
10.1002/pmic.200401118
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.
引用
收藏
页码:876 / 884
页数:9
相关论文
共 50 条
  • [31] Performance Analysis of Support Vector Machine Combined with Global Encoding on Detection of Protein-Protein Interaction Network of HIV Virus
    Lestari, D.
    Aprilia, S.
    Bustamam, A.
    PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON CURRENT PROGRESS IN MATHEMATICS AND SCIENCES 2017 (ISCPMS2017), 2018, 2023
  • [32] Prediction of protein-protein interactions from primary sequences
    Dong, Qiwen
    Zhou, Shuigeng
    Liu, Xuan
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2010, 4 (02) : 211 - 227
  • [33] An Integrated Prediction Method for Identifying Protein-Protein Interactions
    Xu, Chang
    Jiang, Limin
    Zhang, Zehua
    Yu, Xuyao
    Chen, Renhai
    Xu, Junhai
    CURRENT PROTEOMICS, 2020, 17 (04) : 271 - 286
  • [34] Prediction of Protein-Protein Interactions by Evidence Combining Methods
    Chang, Ji-Wei
    Zhou, Yan-Qing
    Ul Qamar, Muhammad Tahir
    Chen, Ling-Ling
    Ding, Yu-Duan
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2016, 17 (11)
  • [35] Structural prediction of protein-protein interactions in Saccharomyces cerevisiae
    Paradesi, Martin S. R.
    Caragea, Doina
    Hsu, William H.
    PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 1270 - 1274
  • [36] Computational Approaches for the Prediction of Protein-Protein Interactions: A Survey
    Theofilatos, Konstantinos A.
    Dimitrakopoulos, Christos M.
    Tsakalidis, Athanasios K.
    Likothanassis, Spyridon D.
    Papadimitriou, Stergios T.
    Mavroudi, Seferina P.
    CURRENT BIOINFORMATICS, 2011, 6 (04) : 398 - 414
  • [37] Integrating protein-protein interactions and text mining for protein function prediction
    Samira Jaeger
    Sylvain Gaudan
    Ulf Leser
    Dietrich Rebholz-Schuhmann
    BMC Bioinformatics, 9
  • [38] Prediction of Protein-Protein Interactions Based on Integrating Deep Learning and Feature Fusion
    Tran, Hoai-Nhan
    Nguyen, Phuc-Xuan-Quynh
    Guo, Fei
    Wang, Jianxin
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (11)
  • [39] On network-based kernel methods for protein-protein interactions with applications in protein functions prediction
    Limin Li
    Waiki Ching
    Yatming Chan
    Hiroshi Mamitsuka
    Journal of Systems Science and Complexity, 2010, 23 : 917 - 930
  • [40] On network-based kernel methods for protein-protein interactions with applications in protein functions prediction
    Li, Limin
    Ching, Waiki
    Chan, Yatming
    Mamitsuka, Hiroshi
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2010, 23 (05) : 917 - 930