DNAgenie: accurate prediction of DNA-type-specific binding residues in protein sequences

被引:18
作者
Zhang, Jian [1 ]
Ghadermarzi, Sina [2 ]
Katuwawala, Akila [2 ]
Kurgan, Lukasz [2 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, 237 Nanhu Rd, Xinyang 464000, Henan, Peoples R China
[2] Virginia Commonwealth Univ, Comp Sci, Richmond, VA 23284 USA
基金
中国国家自然科学基金;
关键词
protein-DNA interactions; DNA-binding residues; A-DNA; B-DNA; single-stranded DNA; double-stranded DNA; prediction; SITES; DATABASE; SERVER; GENE;
D O I
10.1093/bib/bbab336
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Efforts to elucidate protein-DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie's outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie's webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.
引用
收藏
页数:14
相关论文
共 88 条
[1]   PSSM-based prediction of DNA binding sites in proteins [J].
Ahmad, S ;
Sarai, A .
BMC BIOINFORMATICS, 2005, 6 (1)
[2]   Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information [J].
Ahmad, S ;
Gromiha, MM ;
Sarai, A .
BIOINFORMATICS, 2004, 20 (04) :477-486
[3]   SDBP-Pred: Prediction of single-stranded and double-stranded DNA-binding proteins by extending consensus sequence and K-segmentation strategies into PSSM [J].
Ali, Farman ;
Arif, Muhammad ;
Khan, Zaheer Ullah ;
Kabir, Muhammad ;
Ahmed, Saeed ;
Yu, Dong-Jun .
ANALYTICAL BIOCHEMISTRY, 2020, 589
[4]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[5]   Mitochondrial histone-like DNA-binding proteins are essential for normal cell growth and mitochondrial function in Crithidia fasciculata [J].
Avliyakulov, NK ;
Lukes, J ;
Ray, DS .
EUKARYOTIC CELL, 2004, 3 (02) :518-526
[6]   UniProt: a worldwide hub of protein knowledge [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Alpi, Emanuele ;
Bely, Benoit ;
Bingley, Mark ;
Britto, Ramona ;
Bursteinas, Borisas ;
Busiello, Gianluca ;
Bye-A-Jee, Hema ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Daniel ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Ignatchenko, Alexandr ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Nightingale, Andrew ;
Onwubiko, Joseph ;
Palka, Barbara ;
Pichler, Klemens ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Renaux, Alexandre ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Volynkin, Vladimir ;
Wardell, Tony .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D506-D515
[7]   Japanese encephalitis virus - exploring the dark proteome and disorder-function paradigm [J].
Bhardwaj, Taniya ;
Saumya, Kumar Udit ;
Kumar, Prateek ;
Sharma, Nitin ;
Gadhave, Kundlik ;
Uversky, Vladimir N. ;
Giri, Rajanish .
FEBS JOURNAL, 2020, 287 (17) :3751-3776
[8]   DNA and RNA Quadruplex-Binding Proteins [J].
Brazda, Vaclav ;
Haronikova, Lucia ;
Liao, Jack C. C. ;
Fojta, Miroslav .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2014, 15 (10) :17493-17517
[9]   Cruciform structures are a common DNA feature important for regulating biological processes [J].
Brazda, Vaclav ;
Laister, Rob C. ;
Jagelska, Eva B. ;
Arrowsmith, Cheryl .
BMC MOLECULAR BIOLOGY, 2011, 12
[10]   Protein Data Bank: the single global archive for 3D macromolecular structure data [J].
Burley, Stephen K. ;
Berman, Helen M. ;
Bhikadiya, Charmi ;
Bi, Chunxiao ;
Chen, Li ;
Di Costanzo, Luigi ;
Christie, Cole ;
Duarte, Jose M. ;
Dutta, Shuchismita ;
Feng, Zukang ;
Ghosh, Sutapa ;
Goodsell, David S. ;
Green, Rachel Kramer ;
Guranovic, Vladimir ;
Guzenko, Dmytro ;
Hudson, Brian P. ;
Liang, Yuhe ;
Lowe, Robert ;
Peisach, Ezra ;
Periskova, Irina ;
Randle, Chris ;
Rose, Alexander ;
Sekharan, Monica ;
Shao, Chenghua ;
Tao, Yi-Ping ;
Valasatava, Yana ;
Voigt, Maria ;
Westbrook, John ;
Young, Jasmine ;
Zardecki, Christine ;
Zhuravleva, Marina ;
Kurisu, Genji ;
Nakamura, Haruki ;
Kengaku, Yumiko ;
Cho, Hasumi ;
Sato, Junko ;
Kim, Ju Yaen ;
Ikegawa, Yasuyo ;
Nakagawa, Atsushi ;
Yamashita, Reiko ;
Kudou, Takahiro ;
Bekker, Gert-Jan ;
Suzuki, Hirofumi ;
Iwata, Takeshi ;
Yokochi, Masashi ;
Kobayashi, Naohiro ;
Fujiwara, Toshimichi ;
Velankar, Sameer ;
Kleywegt, Gerard J. ;
Anyango, Stephen .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D520-D528