PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models

被引:3
|
作者
Zhang, Lingrong [1 ]
Liu, Taigang [1 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
关键词
Protein-DNA binding site; Pre-trained protein language model; Model interpretability; RESIDUES; SEQUENCE; COMPLEX;
D O I
10.1016/j.ijbiomac.2024.136147
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein-DNA interactions play critical roles in various biological processes and are essential for drug discovery. However, traditional experimental methods are labor-intensive and unable to keep pace with the increasing volume of protein sequences, leading to a substantial number of proteins lacking DNA-binding annotations. Therefore, developing an efficient computational method to identify protein-DNA binding sites is crucial. Unfortunately, most existing computational methods rely on manually selected features or protein structure information, making these methods inapplicable to large-scale prediction tasks. In this study, we introduced PDNAPred, a sequence-based method that combines two pre-trained protein language models with a designed CNN-GRU network to identify DNA-binding sites. Additionally, to tackle the issue of imbalanced dataset samples, we employed focal loss. Our comprehensive experiments demonstrated that PDNAPred significantly improved the accuracy of DNA-binding site prediction, outperforming existing state-of-the-art sequence-based methods. Remarkably, PDNAPred also achieved results comparable to advanced structure-based methods. The designed CNN-GRU network enhances its capability to detect DNA-binding sites accurately. Furthermore, we validated the versatility of PDNAPred by training it on RNA-binding site datasets, showing its potential as a general framework for amino acid binding site prediction. Finally, we conducted model interpretability analysis to elucidate the reasons behind PDNAPred's outstanding performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Improved prediction of MHC-peptide binding using protein language models
    Hashemi, Nasser
    Hao, Boran
    Ignatov, Mikhail
    Paschalidis, Ioannis Ch.
    Vakili, Pirooz
    Vajda, Sandor
    Kozakov, Dima
    FRONTIERS IN BIOINFORMATICS, 2023, 3
  • [32] T4SEfinder: a bioinformatics tool for genome-scale prediction of bacterial type IV secreted effectors using pre-trained protein language model
    Zhang, Yumeng
    Zhang, Yangming
    Xiong, Yi
    Wang, Hui
    Deng, Zixin
    Song, Jiangning
    Ou, Hong-Yu
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [33] Pair-EGRET: enhancing the prediction of protein-protein interaction sites through graph attention networks and protein language models
    Alam, Ramisa
    Mahbub, Sazan
    Bayzid, Md Shamsuzzoha
    BIOINFORMATICS, 2024, 40 (10)
  • [34] A Transformer-Based Deep Learning Approach with Multi-layer Feature Processing for Accurate Prediction of Protein-DNA Binding Residues
    Zhao, Haipeng
    Zhu, Baozhong
    Jiang, Tengsheng
    Cui, Zhiming
    Wu, Hongjie
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT III, 2023, 14088 : 556 - 567
  • [35] ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction
    Tubiana, Jerome
    Schneidman-Duhovny, Dina
    Wolfson, Haim J.
    NATURE METHODS, 2022, 19 (06) : 730 - +
  • [36] RBPsuite: RNA-protein binding sites prediction suite based on deep learning
    Pan, Xiaoyong
    Fang, Yi
    Li, Xianfeng
    Yang, Yang
    Shen, Hong-Bin
    BMC GENOMICS, 2020, 21 (01)
  • [37] Structure-based computational analysis of protein binding sites for function and druggability prediction
    Nisius, Britta
    Sha, Fan
    Gohlke, Holger
    JOURNAL OF BIOTECHNOLOGY, 2012, 159 (03) : 123 - 134
  • [38] A Review About RNA-Protein-Binding Sites Prediction Based on Deep Learning
    Yan, Jianrong
    Zhu, Min
    IEEE ACCESS, 2020, 8 : 150929 - 150944
  • [39] Prediction of DNA-Binding Protein-Drug-Binding Sites Using Residue Interaction Networks and Sequence Feature
    Wang, Wei
    Zhang, Yu
    Liu, Dong
    Zhang, HongJun
    Wang, XianFang
    Zhou, Yun
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2022, 10
  • [40] StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence
    Gattani, Suraj
    Mishra, Avdesh
    Hoque, Md Tamjidul
    CARBOHYDRATE RESEARCH, 2019, 486