qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids

被引:2
作者
Wu, Zhonghua [1 ]
Basu, Sushmita [2 ]
Wu, Xuantai [1 ]
Kurgan, Lukasz [2 ]
机构
[1] Nankai Univ, Sch Math Sci, LPMC, Tianjin, Peoples R China
[2] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
关键词
prediction; protein function; protein-nucleic acids interactions; protein sequence; PROTEIN SECONDARY STRUCTURE; SOLVENT ACCESSIBILITY; FOLDING RATES; RESIDUE FLEXIBILITY; RNA; DNA; SITES; RECOGNITION; FEATURES; DATABASE;
D O I
10.1002/pro.4544
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at . This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
引用
收藏
页数:15
相关论文
共 82 条
[41]   Prediction of protein secondary structure content [J].
Liu, WM ;
Chou, KC .
PROTEIN ENGINEERING, 1999, 12 (12) :1041-1050
[42]   RNANetMotif: Identifying sequence-structure RNA network motifs in RNA-protein binding sites [J].
Ma, Hongli ;
Wen, Han ;
Xue, Zhiyuan ;
Li, Guojun ;
Zhang, Zhaolei .
PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (07)
[43]   Genome-wide survey of DNA-binding proteins in Arabidopsis thaliana: analysis of distribution and functions [J].
Malhotra, Sony ;
Sowdhamini, Ramanathan .
NUCLEIC ACIDS RESEARCH, 2013, 41 (15) :7212-7219
[44]   A Large-Scale Assessment of Nucleic Acids Binding Site Prediction Programs [J].
Miao, Zhichao ;
Westhof, Eric .
PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (12)
[45]   AIRBP: Accurate identification of RNA-binding proteins using machine learning techniques [J].
Mishra, Avdesh ;
Khanal, Reecha ;
Ul Kabir, Wasi ;
Hoque, Tamjidul .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2021, 113 (113)
[46]   StackDPPred: a stacking based prediction of DNA-binding protein from sequence [J].
Mishra, Avdesh ;
Pokhrel, Pujan ;
Hoque, Md Tamjidul .
BIOINFORMATICS, 2019, 35 (03) :433-441
[47]   In-silico prediction of disorder content using hybrid sequence representation [J].
Mizianty, Marcin J. ;
Zhang, Tuo ;
Xue, Bin ;
Zhou, Yaoqi ;
Dunker, A. Keith ;
Uversky, Vladimir N. ;
Kurgan, Lukasz .
BMC BIOINFORMATICS, 2011, 12
[48]   Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation [J].
O'Leary, Nuala A. ;
Wright, Mathew W. ;
Brister, J. Rodney ;
Ciufo, Stacy ;
McVeigh, Diana Haddad Rich ;
Rajput, Bhanu ;
Robbertse, Barbara ;
Smith-White, Brian ;
Ako-Adjei, Danso ;
Astashyn, Alexander ;
Badretdin, Azat ;
Bao, Yiming ;
Blinkova, Olga ;
Brover, Vyacheslav ;
Chetvernin, Vyacheslav ;
Choi, Jinna ;
Cox, Eric ;
Ermolaeva, Olga ;
Farrell, Catherine M. ;
Goldfarb, Tamara ;
Gupta, Tripti ;
Haft, Daniel ;
Hatcher, Eneida ;
Hlavina, Wratko ;
Joardar, Vinita S. ;
Kodali, Vamsi K. ;
Li, Wenjun ;
Maglott, Donna ;
Masterson, Patrick ;
McGarvey, Kelly M. ;
Murphy, Michael R. ;
O'Neill, Kathleen ;
Pujar, Shashikant ;
Rangwala, Sanjida H. ;
Rausch, Daniel ;
Riddick, Lillian D. ;
Schoch, Conrad ;
Shkeda, Andrei ;
Storz, Susan S. ;
Sun, Hanzhen ;
Thibaud-Nissen, Francoise ;
Tolstoy, Igor ;
Tully, Raymond E. ;
Vatsan, Anjana R. ;
Wallin, Craig ;
Webb, David ;
Wu, Wendy ;
Landrum, Melissa J. ;
Kimchi, Avi ;
Tatusova, Tatiana .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D733-D745
[49]   Sequence-based prediction of protein-binding sites in DNA: Comparative study of two SVM models [J].
Park, Byungkyu ;
Im, Jinyong ;
Tuvshinjargal, Narankhuu ;
Lee, Wook ;
Han, Kyungsook .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2014, 117 (02) :158-167
[50]   InterPro in 2022 [J].
Paysan-Lafosse, Typhaine ;
Blum, Matthias ;
Chuguransky, Sara ;
Grego, Tiago ;
Pinto, Beatriz Lazaro ;
Salazar, Gustavo A. ;
Bileschi, Maxwell L. ;
Bork, Peer ;
Bridge, Alan ;
Colwell, Lucy ;
Gough, Julian ;
Haft, Daniel H. ;
Letunic, Ivica ;
Marchler-Bauer, Aron ;
Mi, Huaiyu ;
Natale, Darren A. ;
Orengo, Christine A. ;
Pandurangan, Arun P. ;
Rivoire, Catherine ;
Sigrist, Christian J. A. ;
Sillitoe, Ian ;
Thanki, Narmada ;
Thomas, Paul D. ;
Tosatto, Silvio C. E. ;
Wu, Cathy H. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) :D418-D427