HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins

被引:10
|
作者
Zhang, Jian [1 ]
Basu, Sushmita [2 ]
Kurgan, Lukasz [2 ]
机构
[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Peoples R China
[2] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
基金
美国国家科学基金会;
关键词
INTRINSIC DISORDER; 3; DOMAINS; RESIDUES; SITES; RNA; ACCURATE; DATABASE; IDENTIFICATION; INFORMATION; FEATURES;
D O I
10.1093/nar/gkad1131
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Sequence-based prediction of DNA-binding sites on DNA-binding proteins
    Gou, Z.
    Hwang, S.
    Kuznetsov, B., I
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 268 - +
  • [2] Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins
    Zhang, Jian
    Ghadermarzi, Sina
    Kurgan, Lukasz
    BIOINFORMATICS, 2020, 36 (18) : 4729 - 4738
  • [3] Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information
    Ma, Xin
    Guo, Jing
    Liu, Hong-De
    Xie, Jian-Ming
    Sun, Xiao
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (06) : 1766 - 1775
  • [4] DP-Bind: a Web server for sequence-based prediction of DNA-binding residues in DNA-binding proteins
    Hwang, Seungwoo
    Gou, Zhenkun
    Kuznetsov, Igor B.
    BIOINFORMATICS, 2007, 23 (05) : 634 - 636
  • [5] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Yuqing Qian
    Limin Jiang
    Yijie Ding
    Jijun Tang
    Fei Guo
    BMC Bioinformatics, 22
  • [6] A sequence-based multiple kernel model for identifying DNA-binding proteins
    Qian, Yuqing
    Jiang, Limin
    Ding, Yijie
    Tang, Jijun
    Guo, Fei
    BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
  • [7] An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis
    Zou, Chuanxin
    Gong, Jiayu
    Li, Honglin
    BMC BIOINFORMATICS, 2013, 14
  • [8] An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis
    Chuanxin Zou
    Jiayu Gong
    Honglin Li
    BMC Bioinformatics, 14
  • [9] Predicting DNA-binding sites of proteins from amino acid sequence
    Changhui Yan
    Michael Terribilini
    Feihong Wu
    Robert L Jernigan
    Drena Dobbs
    Vasant Honavar
    BMC Bioinformatics, 7
  • [10] Predicting DNA-binding sites of proteins from amino acid sequence
    Yan, Changhui
    Terribilini, Michael
    Wu, Feihong
    Jernigan, Robert L.
    Dobbs, Drena
    Honavar, Vasant
    BMC BIOINFORMATICS, 2006, 7 (1)