A deep learning-based method for the prediction of DNA interacting residues in a protein

被引:11
作者
Patiyal, Sumeet [1 ]
Dhall, Anjali [1 ]
Raghava, Gajendra P. S. [1 ]
机构
[1] Indraprastha Inst Informat Technol, Dept Computat Biol, A-302,R&D Block,Okhla Ind Estate,Phase 3, New Delhi 110020, India
关键词
DNA-binding residues; deep learning; 1D-CNN; machine learning; evolutionary profiles; NUCLEIC ACID RECOGNITION; BINDING RESIDUES; WEB SERVER; SEQUENCE; SITES; ENERGY; PARADIGM; DYNAMICS; FEATURES; DATABASE;
D O I
10.1093/bib/bbac322
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA-protein interaction is one of the most crucial interactions in the biological system, which decides the fate of many processes such as transcription, regulation and splicing of genes. In this study, we trained our models on a training dataset of 646 DNA-binding proteins having 15 636 DNA interacting and 298 503 non-interacting residues. Our trained models were evaluated on an independent dataset of 46 DNA-binding proteins having 965 DNA interacting and 9911 non-interacting residues. All proteins in the independent dataset have less than 30% of sequence similarity with proteins in the training dataset. A wide range of traditional machine learning and deep learning (1D-CNN) techniques-based models have been developed using binary, physicochemical properties and Position-Specific Scoring Matrix (PSSM)/evolutionary profiles. In the case of machine learning technique, eXtreme Gradient Boosting-based model achieved a maximum area under the receiver operating characteristics (AUROC) curve of 0.77 on the independent dataset using PSSM profile. Deep learning-based model achieved the highest AUROC of 0.79 on the independent dataset using a combination of all three profiles. We evaluated the performance of existing methods on the independent dataset and observed that our proposed method outperformed all the existing methods. In order to facilitate scientific community, we developed standalone software and web server, which are accessible from https://webs.iiitd.edu.in/raghava/dbpred.
引用
收藏
页数:12
相关论文
共 71 条
[1]   DNA deformation energy as an indirect recognition mechanism in protein-DNA interactions [J].
Aeling, Kimberly A. ;
Steffen, Nicholas R. ;
Johnson, Matthew ;
Hatfield, G. Wesley ;
Lathrop, Richard H. ;
Senear, Donald F. .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (01) :117-125
[2]  
Agarap AFM., ARXIV 2018180308375V
[3]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[4]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[5]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[6]   Prediction of DNA-Binding Residues in Local Segments of Protein Sequences with Fuzzy Cognitive Maps [J].
Amirkhani, Abdollah ;
Kolahdoozi, Mojtaba ;
Wang, Chen ;
Kurgan, Lukasz A. .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (04) :1372-1382
[7]   The process of structure-based drug design [J].
Anderson, AC .
CHEMISTRY & BIOLOGY, 2003, 10 (09) :787-797
[8]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[9]   Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities [J].
Berger, Michael F. ;
Philippakis, Anthony A. ;
Qureshi, Aaron M. ;
He, Fangxue S. ;
Estep, Preston W., III ;
Bulyk, Martha L. .
NATURE BIOTECHNOLOGY, 2006, 24 (11) :1429-1435
[10]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242