APRICOT: an integrated computational pipeline for the sequence-based identification and characterization of RNA-binding proteins

被引:20
作者
Sharan, Malvika [1 ]
Foerstner, Konrad U. [2 ]
Eulalio, Ana [1 ]
Vogel, Joerg [1 ]
机构
[1] Univ Wurzburg, Inst Mol Infect Biol, D-97080 Wurzburg, Germany
[2] Univ Wurzburg, Core Unit Syst Med, D-97080 Wurzburg, Germany
关键词
FAMILY CLASSIFICATION; FOLD RECOGNITION; PREDICTION; RESIDUES; DATABASE; LOCALIZATION; DISCOVERY; ACCURATE; BIOLOGY; CSRA;
D O I
10.1093/nar/gkx137
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
RNA-binding proteins (RBPs) have been established as core components of several post-transcriptional gene regulation mechanisms. Experimental techniques such as cross-linking and co-immunoprecipitation have enabled the identification of RBPs, RNA-binding domains (RBDs) and their regulatory roles in the eukaryotic species such as human and yeast in large-scale. In contrast, our knowledge of the number and potential diversity of RBPs in bacteria is poorer due to the technical challenges associated with the existing global screening approaches. We introduce APRICOT, a computational pipeline for the sequence-based identification and characterization of proteins using RBDs known from experimental studies. The pipeline identifies functional motifs in protein sequences using positionspecific scoring matrices and Hidden Markov Models of the functional domains and statistically scores them based on a series of sequence-based features. Subsequently, APRICOT identifies putative RBPs and characterizes them by several biological properties. Here we demonstrate the application and adaptability of the pipeline on large-scale protein sets, including the bacterial proteome of Escherichia coli. APRICOT showed better performance on various datasets compared to other existing tools for the sequence-based prediction of RBPs by achieving an average sensitivity and specificity of 0.90 and 0.91 respectively. The command-line tool and its documentation are available at https://pypi.python.org/pypi/bio-apricot.
引用
收藏
页数:13
相关论文
共 64 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [3] [Anonymous], DATABASE
  • [4] [Anonymous], NATURE PRECEDINGS
  • [5] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [6] The mRNA-Bound Proteome and Its Global Occupancy Profile on Protein-Coding Transcripts
    Baltz, Alexander G.
    Munschauer, Mathias
    Schwanhaeusser, Bjoern
    Vasile, Alexandra
    Murakawa, Yasuhiro
    Schueler, Markus
    Youngs, Noah
    Penfold-Brown, Duncan
    Drew, Kevin
    Milek, Miha
    Wyler, Emanuel
    Bonneau, Richard
    Selbach, Matthias
    Dieterich, Christoph
    Landthaler, Markus
    [J]. MOLECULAR CELL, 2012, 46 (05) : 674 - 690
  • [7] Accelerating Discovery and Functional Analysis of Small RNAs with New Technologies
    Barquist, Lars
    Vogel, Joerg
    [J]. ANNUAL REVIEW OF GENETICS, VOL 49, 2015, 49 : 367 - 394
  • [8] Bateman A, 2002, NUCLEIC ACIDS RES, V30, P276, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [9] UniProt: a hub for protein information
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Apweiler, Rolf
    Alpi, Emanuele
    Antunes, Ricardo
    Arganiska, Joanna
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Chavali, Gayatri
    Cibrian-Uhalte, Elena
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Gane, Paul
    Cas-tro, Leyla Garcia
    Garmiri, Penelope
    Hatton-Ellis, Emma
    Hieta, Reija
    Huntley, Rachael
    Legge, Duncan
    Liu, Wudong
    Luo, Jie
    MacDougall, Alistair
    Mutowo, Prudence
    Nightin-gale, Andrew
    Orchard, Sandra
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Turner, Edward
    Volynkin, Vladimir
    Wardell, Tony
    Watkins, Xavier
    Zellner, Hermann
    Cowley, Andrew
    Figueira, Luis
    Li, Weizhong
    McWilliam, Hamish
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D204 - D212
  • [10] The ProDom database of protein domain families: more emphasis on 3D
    Bru, C
    Courcelle, E
    Carrre, S
    Beausse, Y
    Dalmar, S
    Kahn, D
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D212 - D215