Mining of protein-protein interfacial residues from massive protein sequential and spatial data

被引:5
|
作者
Wang, Debby D. [1 ]
Zhou, Weiqiang [1 ]
Yan, Hong [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
Protein-protein interface prediction; 3D alpha shape modeling; Residue sequence profile; Joint mutual information (JMI); Neuro-fuzzy classifiers (NFCs); Neighborhood classifiers (NECs); CART; Extreme learning machines (ELMs); Naive Bayesian classifiers (NBCs); BIG DATA; INTERACTION SITES; DATA-BANK; INFORMATION; PREDICTION; NETWORK;
D O I
10.1016/j.fss.2014.01.017
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
It is a great challenge to process big data in bioinformatics. In this paper, we addressed the problem of identifying protein-protein interfacial residues from massive protein structural data. A protein set, comprising 154993 residues, was analyzed. We applied the three-dimensional alpha shape modeling to the search of surface and interfacial residues in this set, and adopted the spatially neighboring residue profiles to characterize each residue. These residue profiles, which revealed the sequential and spatial information of proteins, translated the original data into a large matrix. After vertically and horizontally refining this matrix, we comparably implemented a series of popular learning procedures, including neuro-fuzzy classifiers (NFCs), CART, neighborhood classifiers (NECs), extreme learning machines (ELMs) and naive Bayesian classifiers (NBCs), to predict the interfacial residues, aiming to investigate the sensitivity of these massive structural data to different learning mechanisms. As a consequence, ELMs, CART and NFCs performed better in terms of computational costs; NFCs, NBCs and ELMs provided favorable prediction accuracies. Overall, NFCs, NBCs and ELMs are favourable choices for fastly and accurately handling this type of data. More importantly, the marginal differences between the prediction performances of these methods imply the insensitivity of this type of data to different learning mechanisms. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:101 / 116
页数:16
相关论文
共 50 条
  • [1] Interfacial residues in protein-protein complexes are in the eyes of the beholder
    Parvathy, Jayadevan
    Yazhini, Arangasamy
    Srinivasan, Narayanaswamy
    Sowdhamini, Ramanathan
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2024, 92 (04) : 509 - 528
  • [2] Mining protein-protein interaction data
    Haasl, Ryan J.
    Fang, Jianwen
    CURRENT BIOINFORMATICS, 2006, 1 (02) : 197 - 205
  • [3] Data mining methods for protein-protein interactions
    Nafar, Zahra
    Golshani, Ashkan
    2006 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-5, 2006, : 2090 - +
  • [4] Mining from protein-protein interactions
    Mamitsuka, Hiroshi
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (05) : 400 - 410
  • [5] Interfacial Protein-Protein Associations
    Langdon, Blake B.
    Kastantin, Mark
    Walder, Robert
    Schwartz, Daniel K.
    BIOMACROMOLECULES, 2014, 15 (01) : 66 - 74
  • [6] Protein-protein interactions: Structurally conserved residues at protein-protein interfaces
    Keskin, O
    Haliloglu, T
    Ma, BY
    Nussinov, R
    BIOPHYSICAL JOURNAL, 2004, 86 (01) : 267A - 267A
  • [7] Identification of hot regions in protein-protein interactions by sequential pattern mining
    Chen-Ming Hsu
    Chien-Yu Chen
    Baw-Jhiune Liu
    Chih-Chang Huang
    Min-Hung Laio
    Chien-Chieh Lin
    Tzung-Lin Wu
    BMC Bioinformatics, 8
  • [8] Identification of hot regions in protein-protein interactions by sequential pattern mining
    Hsu, Chen-Ming
    Chen, Chien-Yu
    Liu, Baw-Jhiune
    Huang, Chih-Chang
    Laio, Min-Hung
    Lin, Chien-Chieh
    Wu, Tzung-Lin
    BMC BIOINFORMATICS, 2007, 8 (Suppl 5)
  • [9] Identifying protein-protein interfacial residues in heterocomplexes using residue conservation scores
    Li, Jing-Jing
    Huang, De-Shuang
    Wang, Bing
    Chen, Pen
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2006, 38 (3-5) : 241 - 247
  • [10] Efficient mining from heterogeneous data sets for predicting protein-protein interactions
    Mamitsuka, H
    14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 32 - 36