Utilizing knowledge base of amino acids structural neighborhoods to predict protein-protein interaction sites

被引:5
作者
Jelinek, Jan [1 ]
Skoda, Petr [1 ]
Hoksza, David [1 ]
机构
[1] Charles Univ Prague, Fac Math & Phys, Dept Software Engn, Karlovu 3, Prague 2, Czech Republic
来源
BMC BIOINFORMATICS | 2017年 / 18卷
关键词
Protein-protein interaction; Prediction; Molecular fingerprints; Data mining; INTERFACES; FINGERPRINTS; PERFORMANCE; FEATURES;
D O I
10.1186/s12859-017-1921-4
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Protein-protein interactions (PPI) play a key role in an investigation of various biochemical processes, and their identification is thus of great importance. Although computational prediction of which amino acids take part in a PPI has been an active field of research for some time, the quality of in-silico methods is still far from perfect. Results: We have developed a novel prediction method called INSPiRE which benefits from a knowledge base built from data available in Protein Data Bank. All proteins involved in PPIs were converted into labeled graphs with nodes corresponding to amino acids and edges to pairs of neighboring amino acids. A structural neighborhood of each node was then encoded into a bit string and stored in the knowledge base. When predicting PPIs, INSPiRE labels amino acids of unknown proteins as interface or non-interface based on how often their structural neighborhood appears as interface or non-interface in the knowledge base. We evaluated INSPiRE's behavior with respect to different types and sizes of the structural neighborhood. Furthermore, we examined the suitability of several different features for labeling the nodes. Our evaluations showed that INSPiRE clearly outperforms existing methods with respect to Matthews correlation coefficient. Conclusion: In this paper we introduce a new knowledge-based method for identification of protein-protein interaction sites called INSPiRE. Its knowledge base utilizes structural patterns of known interaction sites in the Protein Data Bank which are then used for PPI prediction. Extensive experiments on several well-established datasets show that INSPiRE significantly surpasses existing PPI approaches.
引用
收藏
页数:10
相关论文
共 23 条
  • [1] Algorithmic approaches to protein-protein interaction site prediction
    Aumentado-Armstrong, Tristan T.
    Istrate, Bogdan
    Murgita, Robert A.
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2015, 10
  • [2] Transient protein-protein interface prediction: datasets, features, algorithms, and the RAD-T predictor
    Bendell, Calem J.
    Liu, Shalon
    Aumentado-Armstrong, Tristan
    Istrate, Bogdan
    Cernek, Paul T.
    Khan, Samuel
    Picioreanu, Sergiu
    Zhao, Michael
    Murgita, Robert A.
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [3] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [4] ATOM PAIRS AS MOLECULAR-FEATURES IN STRUCTURE ACTIVITY STUDIES - DEFINITION AND APPLICATIONS
    CARHART, RE
    SMITH, DH
    VENKATARAGHAVAN, R
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1985, 25 (02): : 64 - 73
  • [5] Prediction of interface residues in protein-protein complexes by a consensus neural network method: Test against NMR data
    Chen, HL
    Zhou, HX
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (01) : 21 - 35
  • [6] CRF-based models of protein surfaces improve protein-protein interaction site predictions
    Dong, Zhijie
    Wang, Keyu
    Truong Khanh Linh Dang
    Gueltas, Mehmet
    Welter, Marlon
    Wierschin, Torsten
    Stanke, Mario
    Waack, Stephan
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [7] Analysis and comparison of 2D fingerprints: Insights into database screening performance using eight fingerprint methods
    Duan, Jianxin
    Dixon, Steven L.
    Lowrie, Jeffrey F.
    Sherman, Woody
    [J]. JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2010, 29 (02) : 157 - 170
  • [8] Progress and challenges in predicting protein interfaces
    Esmaielbeiki, Reyhaneh
    Krawczyk, Konrad
    Knapp, Bernhard
    Nebel, Jean-Christophe
    Deane, Charlotte M.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2016, 17 (01) : 117 - 131
  • [9] Using Neo4j for mining protein graphs: a case study
    Hoksza, David
    Jelinek, Jan
    [J]. 2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 230 - 234
  • [10] Jelínek J, 2016, INT CONF COMPUT ADV