Effect of training datasets on support vector machine prediction of protein-protein interactions

被引:62
|
作者
Lo, SL
Cai, CZ
Chen, YZ
Chung, MCM
机构
[1] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
[2] Natl Univ Singapore, Bioproc Technol Inst, Singapore 117597, Singapore
[3] Natl Univ Singapore, Dept Computat Sci, Singapore 117597, Singapore
[4] Natl Univ Singapore, Dept Biochem, Singapore 117597, Singapore
关键词
database of interacting proteins; protein function prediction; protein-protein interaction; shuffled sequence; support vector machine; SVMlight;
D O I
10.1002/pmic.200401118
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.
引用
收藏
页码:876 / 884
页数:9
相关论文
共 50 条
  • [31] Prediction and redesign of protein-protein interactions
    Lua, Rhonald C.
    Marciano, David C.
    Katsonis, Panagiotis
    Adikesavan, Anbu K.
    Wilkins, Angela D.
    Lichtarge, Olivier
    PROGRESS IN BIOPHYSICS & MOLECULAR BIOLOGY, 2014, 116 (2-3): : 194 - 202
  • [32] Prediction Protein-Protein Interactions with LSTM
    Tao, Zheng
    Yao, Jiahao
    Yuan, Chao
    Zhao, Ning
    Yang, Bin
    Chen, Baitong
    Bao, Wenzheng
    SIMULATION TOOLS AND TECHNIQUES, SIMUTOOLS 2021, 2022, 424 : 540 - 545
  • [33] Prediction of physical protein-protein interactions
    Szilágyi, A
    Grimm, V
    Arakaki, AK
    Skolnick, J
    PHYSICAL BIOLOGY, 2005, 2 (02) : S1 - S16
  • [34] Protein Features Identification for Machine Learning-Based Prediction of Protein-Protein Interactions
    Raza, Khalid
    INFORMATION, COMMUNICATION AND COMPUTING TECHNOLOGY, 2017, 750 : 305 - 317
  • [35] Combining protein-protein interactions information with support vector machine to identify chronic obstructive pulmonary disease related genes
    Lin Hua
    Ping Zhou
    Molecular Biology, 2014, 48 : 287 - 296
  • [36] Combining protein-protein interactions information with support vector machine to identify chronic obstructive pulmonary disease related genes
    Hua, Lin
    Zhou, Ping
    MOLECULAR BIOLOGY, 2014, 48 (02) : 287 - 296
  • [37] Improving protein-protein interaction prediction based on phylogenetic information using a least-squares support vector machine
    Craig, Roger A.
    Liao, Li
    REVERSE ENGINEERING BIOLOGICAL NETWORKS: OPPORTUNITIES AND CHALLENGES IN COMPUTATIONAL METHODS FOR PATHWAY INFERENCE, 2007, 1115 : 154 - 167
  • [38] Prediction of Protein Coding Regions by Support Vector Machine
    Guo Shuo
    Zhu Yi-sheng
    2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT UBIQUITOUS COMPUTING AND EDUCATION, 2009, : 185 - 188
  • [39] Identification of surface residues involved in protein-protein interaction - A support vector machine approach
    Yan, CH
    Dobbs, D
    Honavar, V
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2003, : 53 - 62
  • [40] Support Vector Machines for Predicting Protein-Protein Interactions using Domains and Hydrophobicity Features
    Alashwal, Hany
    Deris, Safaai
    Othman, Razib M.
    2006 INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS (ICOCI 2006), 2006, : 574 - +