Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory

被引:0
作者
Zhao, Haipeng [1 ]
Zhu, Baozhong [1 ]
Jiang, Tengsheng [2 ]
Cui, Zhiming [1 ]
Wu, Hongjie [1 ]
机构
[1] Suzhou Univ Sci & Technol, Sch Elect & Informat Engn, Suzhou, Peoples R China
[2] Nanjing Med Univ, Gusu Sch, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA-protein binding residues identification; Transformer encoder; BiLSTM; deep learning; STRUCTURE-BASED PREDICTION; SITES;
D O I
10.3934/mbe.2024008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA-protein binding is crucial for the normal development and function of organisms. The significance of accurately identifying DNA-protein binding sites lies in its role in disease prevention and the development of innovative approaches to disease treatment. In the present study, we introduce a precise and robust identifier for DNA-protein binding residues. In the context of protein representation, we combine the evolutionary information of the protein, represented by its position -specific scoring matrix, with the spatial information of the protein's secondary structure, enriching the overall informational content. This approach initially employs a combination of Bi-directional Long Short-Term Memory and Transformer encoder to jointly extract the interdependencies among residues within the protein sequence. Subsequently, convolutional operations are applied to the resulting feature matrix to capture local features of the residues. Experimental results on the benchmark dataset demonstrate that our method exhibits a higher level of competitiveness when compared to contemporary classifiers. Specifically, our method achieved an MCC of 0.349, SP of 96.50%, SN of 44.03% and ACC of 94.59% on the PDNA-41 dataset.
引用
收藏
页码:170 / 185
页数:16
相关论文
共 41 条
  • [21] Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites
    Mandel-Gutfreund, Y
    Margalit, H
    [J]. NUCLEIC ACIDS RESEARCH, 1998, 26 (10) : 2306 - 2312
  • [22] The PSIPRED protein structure prediction server
    McGuffin, LJ
    Bryson, K
    Jones, DT
    [J]. BIOINFORMATICS, 2000, 16 (04) : 404 - 405
  • [23] scAAGA: Single cell data analysis framework using asymmetric autoencoder with gene attention
    Meng, Rui
    Yin, Shuaidong
    Sun, Jianqiang
    Hu, Huan
    Zhao, Qi
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 165
  • [24] Paraskevopoulou MD, 2016, METHODS MOL BIOL, V1402, P271, DOI 10.1007/978-1-4939-3378-5_21
  • [25] De-novo protein function prediction using DNA binding and RNA binding proteins as a test case
    Peled, Sapir
    Leiderman, Olga
    Charar, Rotem
    Efroni, Gilat
    Shav-Tal, Yaron
    Ofran, Yanay
    [J]. NATURE COMMUNICATIONS, 2016, 7
  • [26] SMART: identification and annotation of domains from signalling and extracellular protein sequences
    Ponting, CP
    Schultz, J
    Milpetz, F
    Bork, P
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (01) : 229 - 232
  • [27] An Overview of the Prediction of Protein DNA-Binding Sites
    Si, Jingna
    Zhao, Rui
    Wu, Rongling
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2015, 16 (03): : 5194 - 5215
  • [28] MetaDBSite: a meta approach to improve protein DNA-binding sites prediction
    Si, Jingna
    Zhang, Zengming
    Lin, Biaoyang
    Schroeder, Michael
    Huang, Bingding
    [J]. BMC SYSTEMS BIOLOGY, 2011, 5
  • [29] Structure-based prediction of DNA-binding sites on proteins using the empirical preference of electrostatic potential and the shape of molecular surfaces
    Tsuchiya, Y
    Kinoshita, K
    Nakamura, H
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2004, 55 (04) : 885 - 894
  • [30] BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences
    Wang, Liangjiang
    Brown, Susan J.
    [J]. NUCLEIC ACIDS RESEARCH, 2006, 34 : W243 - W248