PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models

被引:3
|
作者
Zhang, Lingrong [1 ]
Liu, Taigang [1 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
关键词
Protein-DNA binding site; Pre-trained protein language model; Model interpretability; RESIDUES; SEQUENCE; COMPLEX;
D O I
10.1016/j.ijbiomac.2024.136147
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein-DNA interactions play critical roles in various biological processes and are essential for drug discovery. However, traditional experimental methods are labor-intensive and unable to keep pace with the increasing volume of protein sequences, leading to a substantial number of proteins lacking DNA-binding annotations. Therefore, developing an efficient computational method to identify protein-DNA binding sites is crucial. Unfortunately, most existing computational methods rely on manually selected features or protein structure information, making these methods inapplicable to large-scale prediction tasks. In this study, we introduced PDNAPred, a sequence-based method that combines two pre-trained protein language models with a designed CNN-GRU network to identify DNA-binding sites. Additionally, to tackle the issue of imbalanced dataset samples, we employed focal loss. Our comprehensive experiments demonstrated that PDNAPred significantly improved the accuracy of DNA-binding site prediction, outperforming existing state-of-the-art sequence-based methods. Remarkably, PDNAPred also achieved results comparable to advanced structure-based methods. The designed CNN-GRU network enhances its capability to detect DNA-binding sites accurately. Furthermore, we validated the versatility of PDNAPred by training it on RNA-binding site datasets, showing its potential as a general framework for amino acid binding site prediction. Finally, we conducted model interpretability analysis to elucidate the reasons behind PDNAPred's outstanding performance.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] MetaDBSite: a meta approach to improve protein DNA-binding sites prediction
    Si, Jingna
    Zhang, Zengming
    Lin, Biaoyang
    Schroeder, Michael
    Huang, Bingding
    BMC SYSTEMS BIOLOGY, 2011, 5
  • [22] MetaDBSite: a Meta Approach to Improve Protein DNA-Binding Sites Prediction
    Si, JingNa
    Zhang, Zengming
    Lin, Biaoyang
    Schroeder, Michael
    Huang, Bingding
    COMPUTATIONAL SYSTEMS BIOLOGY, 2010, 13 : 266 - +
  • [23] A feature-based approach to predict hot spots in protein-DNA binding interfaces
    Zhang, Sijia
    Zhao, Le
    Zheng, Chun-Hou
    Xia, Junfeng
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) : 1038 - 1046
  • [24] VesiMCNN: Using pre-trained protein language models and multiple window scanning convolutional neural networks to identify vesicular transport proteins
    Le, Van The
    Tseng, Yi-Hsuan
    Liu, Yu-Chen
    Malik, Muhammad Shahid
    Ou, Yu-Yen
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 280
  • [25] A polyplex qPCR-based binding assay for protein-DNA interactions
    Moreau, Morgane J. J.
    Schaeffer, Patrick M.
    ANALYST, 2012, 137 (18) : 4111 - 4113
  • [26] Mesoscopic Model and Free Energy Landscape for Protein-DNA Binding Sites: Analysis of Cyanobacterial Promoters
    Tapia-Rojo, Rafael
    Jose Mazo, Juan
    Hernandez, Jose Angel
    Luisa Peleato, Maria
    Fillat, Maria F.
    Falo, Fernando
    PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (10)
  • [27] A Protein-DNA Binding Site Prediction Method Based on Multi-View Feature Fusion of Adjacent Residue
    Yang, Ji
    Zhang, Shuning
    IEEE ACCESS, 2023, 11 : 79609 - 79623
  • [28] Protein-DNA Binding Residues Prediction Using a Deep Learning Model With Hierarchical Feature Extraction
    Guan, Shixuan
    Zou, Quan
    Wu, Hongjie
    Ding, Yijie
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 2619 - 2628
  • [29] Systematic prediction of DNA shape changes due to CpG methylation explains epigenetic effects on protein-DNA binding
    Rao, Satyanarayan
    Chiu, Tsu-Pei
    Kribelbauer, Judith F.
    Mann, Richard S.
    Bussemaker, Harmen J.
    Rohs, Remo
    EPIGENETICS & CHROMATIN, 2018, 11
  • [30] A brief survey of deep learning-based models for CircRNA-protein binding sites prediction
    Shen, Zhen
    Yuan, Lin
    Bao, Wenzheng
    Wang, Siguo
    Zhang, Qinhu
    Huang, De-Shuang
    NEUROCOMPUTING, 2025, 628