Evolutionary couplings and sequence variation effect predict protein binding sites

被引:15
|
作者
Schelling, Maria [1 ]
Hopf, Thomas A. [1 ,2 ,3 ]
Rost, Burkhard [1 ,4 ,5 ,6 ,7 ]
机构
[1] TUM, Dept Informat Bioinformat & Computat Biol i12, Boltzmannstr 3, D-85748 Garching, Germany
[2] Harvard Med Sch, Dept Syst Biol, Boston, MA USA
[3] Harvard Med Sch, Dept Cell Biol, Boston, MA USA
[4] TUM, IAS, Garching, Germany
[5] TUM, Sch Life Sci Weihenstephan WZW, Freising Weihenstephan, Germany
[6] Columbia Univ, Dept Biochem & Mol Biophys, New York, NY USA
[7] Columbia Univ, New York Consortium Membrane Prot Struct NYCOMP, New York, NY USA
关键词
binding site; coevolution; evolutionary couplings; machine learning; neural network; prediction; sequence variation;
D O I
10.1002/prot.25585
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Binding small ligands such as ions or macromolecules such as DNA, RNA, and other proteins is one important aspect of the molecular function of proteins. Many binding sites remain without experimental annotations. Predicting binding sites on a per-residue level is challenging, but if 3D structures are known, information about coevolving residue pairs (evolutionary couplings) can predict catalytic residues through mutual information. Here, we predicted protein binding sites from evolutionary couplings derived from a global statistical model using maximum entropy. Additionally, we included information from sequence variation. A simple method using a weighted sum over eight scores substantially outperformed random (F1 = 19.3% +/- 0.7% vs F1 = 2% for random). Training a neural network on these eight scores (along with predicted solvent accessibility and conservation in protein families) improved substantially (F1 = 26.2% +/- 0.8%). Although the machine learning was limited by the small data set and possibly wrong annotations of binding sites, the predicted binding sites formed spatial clusters in the protein. The source code of the binding site predictions is available through GitHub: .
引用
收藏
页码:1064 / 1074
页数:11
相关论文
共 50 条
  • [1] Sequence-Based Prediction of Protein-Peptide Binding Sites Using Support Vector Machine
    Taherzadeh, Ghazaleh
    Yang, Yuedong
    Zhang, Tuo
    Liew, Alan Wee-Chung
    Zhou, Yaoqi
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2016, 37 (13) : 1223 - 1229
  • [2] Coupling dynamics and evolutionary information with structure to identify protein regulatory and functional binding sites
    Mishra, Sambit K.
    Kandoi, Gaurav
    Jernigan, Robert L.
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2019, 87 (10) : 850 - 868
  • [3] Predicting Metal-Binding Sites from Protein Sequence
    Passerini, Andrea
    Lippi, Marco
    Frasconi, Paolo
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (01) : 203 - 213
  • [4] Evolutionary plasticity of protein families: Coupling between sequence and structure variation
    Panchenko, AR
    Wolf, YI
    Panchenko, LA
    Madej, T
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 61 (03) : 535 - 544
  • [5] NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features
    Hasan, Md. Mehedi
    Khatun, Mst. Shamima
    Mollah, Md. Nurul Haque
    Cao Yong
    Guo Dianjing
    MOLECULES, 2018, 23 (07):
  • [6] Relationship between protein thermodynamic constraints and variation of evolutionary rates among sites
    Echave, Julian
    Jackson, Eleisha L.
    Wilke, Claus O.
    PHYSICAL BIOLOGY, 2015, 12 (02)
  • [7] Generalizing and learning protein-DNA binding sequence representations by an evolutionary algorithm
    Wong, Ka-Chun
    Peng, Chengbin
    Wong, Man-Hon
    Leung, Kwong-Sak
    SOFT COMPUTING, 2011, 15 (08) : 1631 - 1642
  • [8] Identification of Protein-Ligand Binding Sites by Sequence Information and Ensemble Classifier
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2017, 57 (12) : 3149 - 3161
  • [9] Combining Sequence and Epigenomic Data to Predict Transcription Factor Binding Sites Using Deep Learning
    Jing, Fang
    Zhang, Shao-Wu
    Cao, Zhen
    Zhang, Shihua l
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2018, 2018, 10847 : 241 - 252
  • [10] Predicting Protein - RNA Binding sites using sequence statistical Feature of amino acids
    Liu, Xin-Mi
    Gong, Xiu-Jun
    2012 INTERNATIONAL CONFERENCE ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING (ICMPBE2012), 2012, 33 : 334 - 340