EmbedCaps-DBP: Predicting DNA-Binding Proteins Using Protein Sequence Embedding and Capsule Network

被引:3
作者
Naim, Muhammad Khaerul [1 ,3 ]
Mengko, Tati Rajab [1 ]
Hertadi, Rukman [2 ]
Purwarianti, Ayu [1 ,4 ]
Susanty, Meredita [1 ,5 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung 40132, Indonesia
[2] Bandung Inst Technol, Fac Math & Nat Sci, Bandung 40132, Indonesia
[3] Universal Univ, Dept Informat Engn, Batam 29433, Indonesia
[4] Bandung Inst Technol, Ctr Artificial Intelligence U CoE AI VLB, Bandung 40132, Indonesia
[5] Univ Pertamina, Dept Comp Sci, Jakarta 12220, Indonesia
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Protein sequence; Training; Amino acids; Transformers; Feature extraction; Task analysis; Biological system modeling; DNA; Machine learning; Capsule network; DNA-binding proteins; deep learning; machine learning; protein sequence embeddings; IDENTIFICATION; RESIDUES; PSEAAC; DPP;
D O I
10.1109/ACCESS.2023.3328960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA-binding interactions are an essential biological activity with important functions, such as DNA replication, transcription, repair, and recombination. DNA-binding proteins (DBPs) have been strongly associated with various human diseases, such as asthma, cancer, and HIV/AIDS. Therefore, some DBPs are used in the pharmaceutical industry to produce antibiotics, anticancer drugs, and anti-inflammatory drugs. Most previous methods have used evolutionary information to predict DBPs. However, these methods have high computing costs and produce unsatisfactory results. This study presents EmbedCaps-DBP, a new method for improving DBP prediction. First, we used three protein sequence embeddings (ProtT5, ESM-1b, and ESM-2) to extract learned feature representations from protein sequences. Those embedding methods can capture important information about amino acids, such as biophysics, biochemistry, structure, and domains, that have not been fully utilized in protein annotation tasks. Then, we used a 1D-capsule network (CapsNet) as a classifier. EmbedCaps-DBP significantly outperformed all existing classifiers in training and independent datasets. Based on two independent datasets, EmbedCaps-DBP (ProtT5) achieved 12.65% and 0.33% higher accuracies than a recent predictor on PDB2272 and PDB186, respectively. These results indicate that our proposed method is a promising predictor of DBPs.
引用
收藏
页码:121256 / 121268
页数:13
相关论文
共 50 条
  • [1] BiCaps-DBP: Predicting DNA-binding proteins from protein sequences using Bi-LSTM and a 1D-capsule network
    Mursalim, Muhammad K. N.
    Mengko, Tati L. E. R.
    Hertadi, Rukman
    Purwarianti, Ayu
    Susanty, Meredita
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 163
  • [2] Multi-Scale Capsule Network for Predicting DNA-Protein Binding Sites
    Zhang, Qinhu
    Yu, Wenbo
    Han, Kyungsook
    Nandi, Asoke K.
    Huang, De-Shuang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (05) : 1793 - 1800
  • [3] A Useful Tool for the Identification of DNA-binding Proteins Using Graph Convolutional Network
    Chen, Dasheng
    Wei, Leyi
    CURRENT PROTEOMICS, 2021, 18 (05) : 661 - 668
  • [4] The adenovirus DNA-binding protein DBP
    Bertzbach, Luca D.
    Seddar, Laura
    von Stromberg, Konstantin
    Ip, Wing-Hang
    Dobner, Thomas
    Hidalgo, Paloma
    JOURNAL OF VIROLOGY, 2024, 98 (02)
  • [5] CoSEF-DBP: Convolution scope expanding fusion network for identifying DNA-binding proteins through bilingual representations
    Zhang, Hua
    Yang, Xiaoqi
    Chen, Pengliang
    Yang, Cheng
    Chen, Bi
    Jiang, Bo
    Shan, Guogen
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 263
  • [6] Using hidden Markov models to predict DNA-binding proteins with sequence and structure information
    Hsu, Yi-Yu
    Chen, Wei-Jhih
    Chen, Shu-Hui
    Kao, Hung-Yu
    SOFT COMPUTING, 2014, 18 (12) : 2365 - 2376
  • [7] Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features
    Mahmud, S. M. Hasan
    Goh, Kah Ong Michael
    Hosen, Md. Faruk
    Nandi, Dip
    Shoombuatong, Watshara
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [8] Sequence-based prediction of DNA-binding sites on DNA-binding proteins
    Gou, Z.
    Hwang, S.
    Kuznetsov, B., I
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 268 - +
  • [9] A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach
    Cai, Yudong
    He, ZhiSong
    Shi, Xiaohe
    Kong, Xiangying
    Gu, Lei
    Xie, Lu
    MOLECULES AND CELLS, 2010, 30 (02) : 99 - 105
  • [10] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Qian, Yuqing
    Jiang, Limin
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)