PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models

被引:3
|
作者
Zhang, Lingrong [1 ]
Liu, Taigang [1 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
关键词
Protein-DNA binding site; Pre-trained protein language model; Model interpretability; RESIDUES; SEQUENCE; COMPLEX;
D O I
10.1016/j.ijbiomac.2024.136147
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein-DNA interactions play critical roles in various biological processes and are essential for drug discovery. However, traditional experimental methods are labor-intensive and unable to keep pace with the increasing volume of protein sequences, leading to a substantial number of proteins lacking DNA-binding annotations. Therefore, developing an efficient computational method to identify protein-DNA binding sites is crucial. Unfortunately, most existing computational methods rely on manually selected features or protein structure information, making these methods inapplicable to large-scale prediction tasks. In this study, we introduced PDNAPred, a sequence-based method that combines two pre-trained protein language models with a designed CNN-GRU network to identify DNA-binding sites. Additionally, to tackle the issue of imbalanced dataset samples, we employed focal loss. Our comprehensive experiments demonstrated that PDNAPred significantly improved the accuracy of DNA-binding site prediction, outperforming existing state-of-the-art sequence-based methods. Remarkably, PDNAPred also achieved results comparable to advanced structure-based methods. The designed CNN-GRU network enhances its capability to detect DNA-binding sites accurately. Furthermore, we validated the versatility of PDNAPred by training it on RNA-binding site datasets, showing its potential as a general framework for amino acid binding site prediction. Finally, we conducted model interpretability analysis to elucidate the reasons behind PDNAPred's outstanding performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Prediction of Protein-DNA Binding Sites Based on Protein Language Model and Deep Learning
    Shan, Kaixuan
    Zhang, Xiankun
    Song, Chen
    ADVANCED INTELLIGENT COMPUTING IN BIOINFORMATICS, PT II, ICIC 2024, 2024, 14882 : 314 - 325
  • [2] Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning
    Wang, Jue
    Liu, Yufan
    Tian, Boxue
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01):
  • [3] PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network
    Zhang, Lingrong
    Liu, Taigang
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 280
  • [4] Decomposing protein-DNA binding and recognition using simplified protein models
    Etheve, Loic
    Martin, Juliette
    Lavery, Richard
    NUCLEIC ACIDS RESEARCH, 2017, 45 (17) : 10270 - 10283
  • [5] Identifying Protein-Nucleotide Binding Residues via Grouped Multi-task Learning and Pre-trained Protein Language Models
    Wu, Jiashun
    Liu, Yan
    Zhang, Ying
    Wang, Xiaoyu
    Yan, He
    Zhu, Yiheng
    Song, Jiangning
    Yu, Dong-Jun
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2025, 65 (02) : 1040 - 1052
  • [6] Structural Models of Protein-DNA Complexes Based on Interface Prediction and Docking
    Qin, Sanbo
    Zhou, Huan-Xiang
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2011, 12 (06) : 531 - 539
  • [7] Interpretable Prediction of SARS-CoV-2 Epitope-Specific TCR Recognition Using a Pre-Trained Protein Language Model
    Yoo, Sunyong
    Jeong, Myeonghyeon
    Seomun, Subhin
    Kim, Kiseong
    Han, Youngmahn
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (03) : 428 - 438
  • [8] Signatures of Protein-DNA Recognition in Free DNA Binding Sites
    Locasale, Jason W.
    Napoli, Andrew A.
    Chen, Shengfeng
    Berman, Helen M.
    Lawson, Catherine L.
    JOURNAL OF MOLECULAR BIOLOGY, 2009, 386 (04) : 1054 - 1065
  • [9] LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model
    Pakhrin, Subash C.
    Pokharel, Suresh
    Aoki-Kinoshita, Kiyoko F.
    Beck, Moriah R.
    Dam, Tarun K.
    Caragea, Doina
    Kc, Dukka B.
    GLYCOBIOLOGY, 2023, 33 (05) : 411 - 422
  • [10] GraphPBSP: Protein binding site prediction based on Graph Attention Network and pre-trained model ProstT5
    Sun, Xiaohan
    Wu, Zhixiang
    Su, Jingjie
    Li, Chunhua
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 282