Evolutionary couplings and sequence variation effect predict protein binding sites

被引:15
|
作者
Schelling, Maria [1 ]
Hopf, Thomas A. [1 ,2 ,3 ]
Rost, Burkhard [1 ,4 ,5 ,6 ,7 ]
机构
[1] TUM, Dept Informat Bioinformat & Computat Biol i12, Boltzmannstr 3, D-85748 Garching, Germany
[2] Harvard Med Sch, Dept Syst Biol, Boston, MA USA
[3] Harvard Med Sch, Dept Cell Biol, Boston, MA USA
[4] TUM, IAS, Garching, Germany
[5] TUM, Sch Life Sci Weihenstephan WZW, Freising Weihenstephan, Germany
[6] Columbia Univ, Dept Biochem & Mol Biophys, New York, NY USA
[7] Columbia Univ, New York Consortium Membrane Prot Struct NYCOMP, New York, NY USA
关键词
binding site; coevolution; evolutionary couplings; machine learning; neural network; prediction; sequence variation;
D O I
10.1002/prot.25585
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Binding small ligands such as ions or macromolecules such as DNA, RNA, and other proteins is one important aspect of the molecular function of proteins. Many binding sites remain without experimental annotations. Predicting binding sites on a per-residue level is challenging, but if 3D structures are known, information about coevolving residue pairs (evolutionary couplings) can predict catalytic residues through mutual information. Here, we predicted protein binding sites from evolutionary couplings derived from a global statistical model using maximum entropy. Additionally, we included information from sequence variation. A simple method using a weighted sum over eight scores substantially outperformed random (F1 = 19.3% +/- 0.7% vs F1 = 2% for random). Training a neural network on these eight scores (along with predicted solvent accessibility and conservation in protein families) improved substantially (F1 = 26.2% +/- 0.8%). Although the machine learning was limited by the small data set and possibly wrong annotations of binding sites, the predicted binding sites formed spatial clusters in the protein. The source code of the binding site predictions is available through GitHub: .
引用
收藏
页码:1064 / 1074
页数:11
相关论文
共 50 条
  • [21] Protein binding sites for drug design
    Konc, Janez
    Janezic, Dusanka
    BIOPHYSICAL REVIEWS, 2022, 14 (06) : 1413 - 1421
  • [22] A survey on protein-DNA-binding sites in computational biology
    Zhang, Yue
    Bao, Wenzheng
    Cao, Yi
    Cong, Hanhan
    Chen, Baitong
    Chen, Yuehui
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2022, 21 (05) : 357 - 375
  • [23] Automated identification of binding sites for phosphorylated ligands in protein structures
    Ghersi, Dario
    Sanchez, Roberto
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2012, 80 (10) : 2347 - 2358
  • [24] RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites
    Ma, Hongli
    Wen, Han
    Xue, Zhiyuan
    Li, Guojun
    Zhang, Zhaolei
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (07)
  • [25] Identification of DNA-protein binding sites by bootstrap multiple convolutional neural networks on sequence information
    Zhang, Yongqing
    Qiao, Shaojie
    Ji, Shengjie
    Han, Nan
    Liu, Dingxiang
    Zhou, Jiliu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2019, 79 : 58 - 66
  • [26] The origins of the evolutionary signal used to predict protein-protein interactions
    Swapna, Lakshmipuram S.
    Srinivasan, Narayanaswamy
    Robertson, David L.
    Lovell, Simon C.
    BMC EVOLUTIONARY BIOLOGY, 2012, 12
  • [27] The origins of the evolutionary signal used to predict protein-protein interactions
    Lakshmipuram S Swapna
    Narayanaswamy Srinivasan
    David L Robertson
    Simon C Lovell
    BMC Evolutionary Biology, 12
  • [28] RBind: computational network method to predict RNA binding sites
    Wang, Kaili
    Jian, Yiren
    Wang, Huiwen
    Zeng, Chen
    Zhao, Yunjie
    BIOINFORMATICS, 2018, 34 (18) : 3131 - 3136
  • [29] PEvoLM: Protein Sequence Evolutionary Information Language Model
    Arab, Issar
    2023 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, CIBCB, 2023, : 282 - 289
  • [30] Protein structure prediction from sequence variation
    Marks, Debora S.
    Hopf, Thomas A.
    Sander, Chris
    NATURE BIOTECHNOLOGY, 2012, 30 (11) : 1072 - 1080