EmbedCaps-DBP: Predicting DNA-Binding Proteins Using Protein Sequence Embedding and Capsule Network

被引:3
作者
Naim, Muhammad Khaerul [1 ,3 ]
Mengko, Tati Rajab [1 ]
Hertadi, Rukman [2 ]
Purwarianti, Ayu [1 ,4 ]
Susanty, Meredita [1 ,5 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung 40132, Indonesia
[2] Bandung Inst Technol, Fac Math & Nat Sci, Bandung 40132, Indonesia
[3] Universal Univ, Dept Informat Engn, Batam 29433, Indonesia
[4] Bandung Inst Technol, Ctr Artificial Intelligence U CoE AI VLB, Bandung 40132, Indonesia
[5] Univ Pertamina, Dept Comp Sci, Jakarta 12220, Indonesia
来源
IEEE ACCESS | 2023年 / 11卷
关键词
Protein sequence; Training; Amino acids; Transformers; Feature extraction; Task analysis; Biological system modeling; DNA; Machine learning; Capsule network; DNA-binding proteins; deep learning; machine learning; protein sequence embeddings; IDENTIFICATION; RESIDUES; PSEAAC; DPP;
D O I
10.1109/ACCESS.2023.3328960
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA-binding interactions are an essential biological activity with important functions, such as DNA replication, transcription, repair, and recombination. DNA-binding proteins (DBPs) have been strongly associated with various human diseases, such as asthma, cancer, and HIV/AIDS. Therefore, some DBPs are used in the pharmaceutical industry to produce antibiotics, anticancer drugs, and anti-inflammatory drugs. Most previous methods have used evolutionary information to predict DBPs. However, these methods have high computing costs and produce unsatisfactory results. This study presents EmbedCaps-DBP, a new method for improving DBP prediction. First, we used three protein sequence embeddings (ProtT5, ESM-1b, and ESM-2) to extract learned feature representations from protein sequences. Those embedding methods can capture important information about amino acids, such as biophysics, biochemistry, structure, and domains, that have not been fully utilized in protein annotation tasks. Then, we used a 1D-capsule network (CapsNet) as a classifier. EmbedCaps-DBP significantly outperformed all existing classifiers in training and independent datasets. Based on two independent datasets, EmbedCaps-DBP (ProtT5) achieved 12.65% and 0.33% higher accuracies than a recent predictor on PDB2272 and PDB186, respectively. These results indicate that our proposed method is a promising predictor of DBPs.
引用
收藏
页码:121256 / 121268
页数:13
相关论文
共 57 条
[1]   Target-DBPPred: An intelligent model for prediction of DNA-binding proteins using discrete wavelet transform based compression and light eXtreme gradient boosting [J].
Ali, Farman ;
Kumar, Harish ;
Patil, Shruti ;
Kotecha, Ketan ;
Banjar, Ameen ;
Daud, Ali .
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 145
[2]   DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space [J].
Ali, Farman ;
Kabir, Muhammad ;
Arif, Muhammad ;
Swati, Zar Nawab Khan ;
Khan, Zaheer Ullah ;
Ullah, Matee ;
Yu, Dong-Jun .
CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 182 :21-30
[3]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[4]   Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics [J].
Asgari, Ehsaneddin ;
Mofrad, Mohammad R. K. .
PLOS ONE, 2015, 10 (11)
[5]   DBP-CNN: Deep learning-based prediction of DNA-binding proteins by coupling discrete cosine transform with two-dimensional convolutional neural network [J].
Barukab, Omar ;
Ali, Farman ;
Alghamdi, Wajdi ;
Bassam, Yoosef ;
Khan, Sher Afzal .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 197
[6]   UniProt: a worldwide hub of protein knowledge [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Alpi, Emanuele ;
Bely, Benoit ;
Bingley, Mark ;
Britto, Ramona ;
Bursteinas, Borisas ;
Busiello, Gianluca ;
Bye-A-Jee, Hema ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Daniel ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Ignatchenko, Alexandr ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Nightingale, Andrew ;
Onwubiko, Joseph ;
Palka, Barbara ;
Pichler, Klemens ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Renaux, Alexandre ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Volynkin, Vladimir ;
Wardell, Tony .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D506-D515
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   PredictProtein - Predicting Protein Structure and Function for 29 Years [J].
Bernhofer, Michael ;
Dallago, Christian ;
Karl, Tim ;
Satagopam, Venkata ;
Heinzinger, Michael ;
Littmann, Maria ;
Olenyi, Tobias ;
Qiu, Jiajun ;
Schuetze, Konstantin ;
Yachdav, Guy ;
Ashkenazy, Haim ;
Ben-Tal, Nir ;
Bromberg, Yana ;
Goldberg, Tatyana ;
Kajan, Laszlo ;
O'Donoghue, Sean ;
Sander, Chris ;
Schafferhans, Andrea ;
Schlessinger, Avner ;
Vriend, Gerrit ;
Mirdita, Milot ;
Gawron, Piotr ;
Gu, Wei ;
Jarosz, Yohan ;
Trefois, Christophe ;
Steinegger, Martin ;
Schneider, Reinhard ;
Rost, Burkhard .
NUCLEIC ACIDS RESEARCH, 2021, 49 (W1) :W535-W540
[9]   DR_bind: a web server for predicting DNA-binding residues from the protein structure based on electrostatics, evolution and geometry [J].
Chen, Yao Chi ;
Wright, Jon D. ;
Lim, Carmay .
NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) :W249-W256
[10]   BERTMHC: improved MHC-peptide class II interaction prediction with transformer and multiple instance learning [J].
Cheng, Jun ;
Bendjama, Kaidre ;
Rittner, Karola ;
Malone, Brandon .
BIOINFORMATICS, 2021, 37 (22) :4172-4179