Identification of DNA-protein binding residues through integration of Transformer encoder and Bi-directional Long Short-Term Memory

被引:0
作者
Zhao, Haipeng [1 ]
Zhu, Baozhong [1 ]
Jiang, Tengsheng [2 ]
Cui, Zhiming [1 ]
Wu, Hongjie [1 ]
机构
[1] Suzhou Univ Sci & Technol, Sch Elect & Informat Engn, Suzhou, Peoples R China
[2] Nanjing Med Univ, Gusu Sch, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA-protein binding residues identification; Transformer encoder; BiLSTM; deep learning; STRUCTURE-BASED PREDICTION; SITES;
D O I
10.3934/mbe.2024008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
DNA-protein binding is crucial for the normal development and function of organisms. The significance of accurately identifying DNA-protein binding sites lies in its role in disease prevention and the development of innovative approaches to disease treatment. In the present study, we introduce a precise and robust identifier for DNA-protein binding residues. In the context of protein representation, we combine the evolutionary information of the protein, represented by its position -specific scoring matrix, with the spatial information of the protein's secondary structure, enriching the overall informational content. This approach initially employs a combination of Bi-directional Long Short-Term Memory and Transformer encoder to jointly extract the interdependencies among residues within the protein sequence. Subsequently, convolutional operations are applied to the resulting feature matrix to capture local features of the residues. Experimental results on the benchmark dataset demonstrate that our method exhibits a higher level of competitiveness when compared to contemporary classifiers. Specifically, our method achieved an MCC of 0.349, SP of 96.50%, SN of 44.03% and ACC of 94.59% on the PDNA-41 dataset.
引用
收藏
页码:170 / 185
页数:16
相关论文
共 41 条
  • [1] DNA deformation energy as an indirect recognition mechanism in protein-DNA interactions
    Aeling, Kimberly A.
    Steffen, Nicholas R.
    Johnson, Matthew
    Hatfield, G. Wesley
    Lathrop, Richard H.
    Senear, Donald F.
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (01) : 117 - 125
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] UniProt: a worldwide hub of protein knowledge
    Bateman, Alex
    Martin, Maria-Jesus
    Orchard, Sandra
    Magrane, Michele
    Alpi, Emanuele
    Bely, Benoit
    Bingley, Mark
    Britto, Ramona
    Bursteinas, Borisas
    Busiello, Gianluca
    Bye-A-Jee, Hema
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Castro, Leyla Garcia
    Garmiri, Penelope
    Georghiou, George
    Gonzales, Daniel
    Gonzales, Leonardo
    Hatton-Ellis, Emma
    Ignatchenko, Alexandr
    Ishtiaq, Rizwan
    Jokinen, Petteri
    Joshi, Vishal
    Jyothi, Dushyanth
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Madeira, Fabio
    Mahmoudy, Mahdi
    Menchi, Manuela
    Nightingale, Andrew
    Onwubiko, Joseph
    Palka, Barbara
    Pichler, Klemens
    Pundir, Sangya
    Qi, Guoying
    Raj, Shriya
    Renaux, Alexandre
    Lopez, Milagros Rodriguez
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Vasudev, Preethi
    Volynkin, Vladimir
    Wardell, Tony
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D506 - D515
  • [4] Genomic repertoires of DNA-binding transcription factors across the tree of life
    Charoensawan, Varodom
    Wilson, Derek
    Teichmann, Sarah A.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (21) : 7364 - 7377
  • [5] DCAMCP: A deep learning model based on capsule network and attention mechanism for molecular carcinogenicity prediction
    Chen, Zhe
    Zhang, Li
    Sun, Jianqiang
    Meng, Rui
    Yin, Shuaidong
    Zhao, Qi
    [J]. JOURNAL OF CELLULAR AND MOLECULAR MEDICINE, 2023, 27 (20) : 3117 - 3126
  • [6] ProteDNA: a sequence-based predictor of sequence-specific DNA-binding residues in transcription factors
    Chu, Wen-Yi
    Huang, Yu-Feng
    Huang, Chun-Chin
    Cheng, Yi-Sheng
    Huang, Chien-Kang
    Oyang, Yen-Jen
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : W396 - W401
  • [7] Long Non-Coding RNAs in the Regulation of Gene Expression: Physiology and Disease
    Fernandes, Juliane C. R.
    Acuna, Stephanie M.
    Aoki, Juliana, I
    Floeter-Winter, Lucile M.
    Muxel, Sandra M.
    [J]. NON-CODING RNA, 2019, 5 (01)
  • [8] Predicting Protein-DNA Binding Residues by Weightedly Combining Sequence-Based Features and Boosting Multiple SVMs
    Hu, Jun
    Li, Yang
    Zhang, Ming
    Yang, Xibei
    Shen, Hong-Bin
    Yu, Dong-Jun
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (06) : 1389 - 1398
  • [9] DP-Bind: a Web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins
    Hwang, Seungwoo
    Gou, Zhenkun
    Kuznetsov, Igor B.
    [J]. BIOINFORMATICS, 2007, 23 (05) : 634 - 636
  • [10] Current successes and remaining challenges in protein function prediction
    Jeffery, Constance J.
    [J]. FRONTIERS IN BIOINFORMATICS, 2023, 3