qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids

被引:2
作者
Wu, Zhonghua [1 ]
Basu, Sushmita [2 ]
Wu, Xuantai [1 ]
Kurgan, Lukasz [2 ]
机构
[1] Nankai Univ, Sch Math Sci, LPMC, Tianjin, Peoples R China
[2] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
关键词
prediction; protein function; protein-nucleic acids interactions; protein sequence; PROTEIN SECONDARY STRUCTURE; SOLVENT ACCESSIBILITY; FOLDING RATES; RESIDUE FLEXIBILITY; RNA; DNA; SITES; RECOGNITION; FEATURES; DATABASE;
D O I
10.1002/pro.4544
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at . This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
引用
收藏
页数:15
相关论文
共 82 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkh131, 10.1093/nar/gkw1099]
  • [3] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [4] Massively parallel interrogation of protein fragment secretability using SECRiFY reveals features influencing secretory system transit
    Boone, Morgane
    Ramasamy, Pathmanaban
    Zuallaert, Jasper
    Bouwmeester, Robbin
    Van Moer, Berre
    Maddelein, Davy
    Turan, Demet
    Hulstaert, Niels
    Eeckhaut, Hannah
    Vandermarliere, Elien
    Martens, Lennart
    Degroeve, Sven
    De Neve, Wim
    Vranken, Wim
    Callewaert, Nico
    [J]. NATURE COMMUNICATIONS, 2021, 12 (01)
  • [5] TriPepSVM: de novo prediction of RNA-binding proteins based on short amino acid motifs
    Bressin, Annkatrin
    Schulte-Sasse, Roman
    Figini, Davide
    Urdaneta, Erika C.
    Beckmann, Benedikt M.
    Marsico, Annalisa
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (09) : 4406 - 4417
  • [6] RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences
    Burley, Stephen K.
    Bhikadiya, Charmi
    Bi, Chunxiao
    Bittrich, Sebastian
    Chen, Li
    Crichlow, Gregg, V
    Christie, Cole H.
    Dalenberg, Kenneth
    Di Costanzo, Luigi
    Duarte, Jose M.
    Dutta, Shuchismita
    Feng, Zukang
    Ganesan, Sai
    Goodsell, David S.
    Ghosh, Sutapa
    Green, Rachel Kramer
    Guranovic, Vladimir
    Guzenko, Dmytro
    Hudson, Brian P.
    Lawson, Catherine L.
    Liang, Yuhe
    Lowe, Robert
    Namkoong, Harry
    Peisach, Ezra
    Persikova, Irina
    Randle, Chris
    Rose, Alexander
    Rose, Yana
    Sali, Andrej
    Segura, Joan
    Sekharan, Monica
    Shao, Chenghua
    Tao, Yi-Ping
    Voigt, Maria
    Westbrook, John D.
    Young, Jasmine Y.
    Zardecki, Christine
    Zhuravleva, Marina
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D437 - D451
  • [7] Real value prediction of protein solvent accessibility using enhanced PSSM features
    Chang, Darby Tien-Hao
    Huang, Hsuan-Yu
    Syu, Yu-Tang
    Wu, Chih-Peng
    [J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 12)
  • [8] Genomic repertoires of DNA-binding transcription factors across the tree of life
    Charoensawan, Varodom
    Wilson, Derek
    Teichmann, Sarah A.
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (21) : 7364 - 7377
  • [9] iFC2: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content
    Chen, Ke
    Stach, Wojciech
    Homaeian, Leila
    Kurgan, Lukasz
    [J]. AMINO ACIDS, 2011, 40 (03) : 963 - 973
  • [10] Swfoldrate: Predicting protein folding rates from amino acid sequence with sliding window method
    Cheng, Xiang
    Xiao, Xuan
    Wu, Zhi-cheng
    Wang, Pu
    Lin, Wei-zhong
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2013, 81 (01) : 140 - 148