HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins

被引:10
|
作者
Zhang, Jian [1 ]
Basu, Sushmita [2 ]
Kurgan, Lukasz [2 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Peoples R China
[2] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
基金
美国国家科学基金会;
关键词
INTRINSIC DISORDER; 3; DOMAINS; RESIDUES; SITES; RNA; ACCURATE; DATABASE; IDENTIFICATION; INFORMATION; FEATURES;
D O I
10.1093/nar/gkad1131
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.
引用
收藏
页数:13
相关论文
共 27 条
  • [1] Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins
    Zhang, Jian
    Ghadermarzi, Sina
    Kurgan, Lukasz
    BIOINFORMATICS, 2020, 36 (18) : 4729 - 4738
  • [2] Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information
    Ma, Xin
    Guo, Jing
    Liu, Hong-De
    Xie, Jian-Ming
    Sun, Xiao
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (06) : 1766 - 1775
  • [3] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Qian, Yuqing
    Jiang, Limin
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
  • [4] An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis
    Zou, Chuanxin
    Gong, Jiayu
    Li, Honglin
    BMC BIOINFORMATICS, 2013, 14
  • [5] Sequence-based Detection of DNA-binding Proteins using Multiple-View Features Allied with Feature Selection
    Zhou, Liling
    Song, Xiaoning
    Yu, Dong-Jun
    Sun, Jun
    MOLECULAR INFORMATICS, 2020, 39 (08)
  • [6] qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids
    Wu, Zhonghua
    Basu, Sushmita
    Wu, Xuantai
    Kurgan, Lukasz
    PROTEIN SCIENCE, 2023, 32 (01)
  • [7] A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence
    Dang, Truong Khanh Linh
    Meckbach, Cornelia
    Tacke, Rebecca
    Waack, Stephan
    Gueltas, Mehmet
    ENTROPY, 2016, 18 (10)
  • [8] A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach
    Cai, Yudong
    He, ZhiSong
    Shi, Xiaohe
    Kong, Xiangying
    Gu, Lei
    Xie, Lu
    MOLECULES AND CELLS, 2010, 30 (02) : 99 - 105
  • [9] Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naive Bayes
    Lou, Wangchao
    Wang, Xiaoqing
    Chen, Fan
    Chen, Yixiao
    Jiang, Bo
    Zhang, Hua
    PLOS ONE, 2014, 9 (01):
  • [10] StackDPPred: a stacking based prediction of DNA-binding protein from sequence
    Mishra, Avdesh
    Pokhrel, Pujan
    Hoque, Md Tamjidul
    BIOINFORMATICS, 2019, 35 (03) : 433 - 441