HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins

被引：10

作者：

Zhang, Jian ^{[1
]}

Basu, Sushmita ^{[2
]}

Kurgan, Lukasz ^{[2
]}

机构：

[1] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Peoples R China

[2] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA

来源：

NUCLEIC ACIDS RESEARCH | 2023年

基金：

美国国家科学基金会;

关键词：

INTRINSIC DISORDER; 3; DOMAINS; RESIDUES; SITES; RNA; ACCURATE; DATABASE; IDENTIFICATION; INFORMATION; FEATURES;

D O I：

10.1093/nar/gkad1131

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.

引用

页数：13

共 27 条

[1] Prediction of protein-binding residues: dichotomy of sequence-based methods developed using structured complexes versus disordered proteins
Zhang, Jian
Ghadermarzi, Sina
Kurgan, Lukasz
BIOINFORMATICS, 2020, 36 (18) : 4729 - 4738
[2] Sequence-Based Prediction of DNA-Binding Residues in Proteins with Conservation and Correlation Information
Ma, Xin
Guo, Jing
Liu, Hong-De
Xie, Jian-Ming
Sun, Xiao
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (06) : 1766 - 1775
[3] A sequence-based multiple kernel model for identifying DNA-binding proteins
Qian, Yuqing
Jiang, Limin
Ding, Yijie
Tang, Jijun
Guo, Fei
BMC BIOINFORMATICS, 2021, 22 (SUPPL 3)
[4] An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis
Zou, Chuanxin
Gong, Jiayu
Li, Honglin
BMC BIOINFORMATICS, 2013, 14
[5] Sequence-based Detection of DNA-binding Proteins using Multiple-View Features Allied with Feature Selection
Zhou, Liling
Song, Xiaoning
Yu, Dong-Jun
Sun, Jun
MOLECULAR INFORMATICS, 2020, 39 (08)
[6] qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids
Wu, Zhonghua
Basu, Sushmita
Wu, Xuantai
Kurgan, Lukasz
PROTEIN SCIENCE, 2023, 32 (01)
[7] A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence
Dang, Truong Khanh Linh
Meckbach, Cornelia
Tacke, Rebecca
Waack, Stephan
Gueltas, Mehmet
ENTROPY, 2016, 18 (10)
[8] A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach
Cai, Yudong
He, ZhiSong
Shi, Xiaohe
Kong, Xiangying
Gu, Lei
Xie, Lu
MOLECULES AND CELLS, 2010, 30 (02) : 99 - 105
[9] Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naive Bayes
Lou, Wangchao
Wang, Xiaoqing
Chen, Fan
Chen, Yixiao
Jiang, Bo
Zhang, Hua
PLOS ONE, 2014, 9 (01):
[10] StackDPPred: a stacking based prediction of DNA-binding protein from sequence
Mishra, Avdesh
Pokhrel, Pujan
Hoque, Md Tamjidul
BIOINFORMATICS, 2019, 35 (03) : 433 - 441

← 1 2 3 →