ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

被引:24
|
作者
Heller, David [1 ,2 ]
Krestel, Ralf [2 ]
Ohler, Uwe [3 ]
Vingron, Martin [1 ]
Marsico, Annalisa [1 ,4 ]
机构
[1] Max Planck Inst Mol Genet, Ihnestr 63-73, D-14195 Berlin, Germany
[2] Hasso Plattner Inst, Prof Dr Helmert Str 2-3, D-14482 Potsdam, Germany
[3] Max Delbruck Ctr, Robert Roessle Str 10, D-13029 Berlin, Germany
[4] Free Univ Berlin, Arnimallee 14, D-14195 Berlin, Germany
关键词
GENE REGULATORY ELEMENTS; SECONDARY STRUCTURE; DNA; DISCOVERY; SITES; CLIP; MICROPROCESSOR; IDENTIFICATION; RECOGNITION; SPECIFICITY;
D O I
10.1093/nar/gkx756
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image.
引用
收藏
页码:11004 / 11018
页数:15
相关论文
共 50 条
  • [31] Structure and RNA-Binding Properties of Lsm Protein from Halobacterium salinarum
    Fando, Maria S.
    Mikhaylina, Alisa O.
    Lekontseva, Nataliya, V
    Tishchenko, Svetlana, V
    Nikulin, Alexey D.
    BIOCHEMISTRY-MOSCOW, 2021, 86 (07) : 833 - 842
  • [32] StructureMapper: a high-throughput algorithm for analyzing protein sequence locations in structural data
    Nurminen, Anssi
    Hytonen, Vesa P.
    BIOINFORMATICS, 2018, 34 (13) : 2302 - 2304
  • [33] Structure and subcellular localization of a small RNA-binding protein from tobacco
    Moriguchi, K
    Sugita, M
    Sugiura, M
    PLANT JOURNAL, 1997, 12 (01): : 215 - 221
  • [34] Structure and RNA-Binding Properties of Lsm Protein from Halobacterium salinarum
    Maria S. Fando
    Alisa O. Mikhaylina
    Nataliya V. Lekontseva
    Svetlana V. Tishchenko
    Alexey D. Nikulin
    Biochemistry (Moscow), 2021, 86 : 833 - 842
  • [35] Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE
    Riley, Todd R.
    Lazarovici, Allan
    Mann, Richard S.
    Bussemaker, Harmen J.
    ELIFE, 2015, 4
  • [36] Estimating enrichment of repetitive elements from high-throughput sequence data
    Daniel S Day
    Lovelace J Luquette
    Peter J Park
    Peter V Kharchenko
    Genome Biology, 11
  • [37] High-throughput identification of structural variations from sequence trace data
    Koboldt, Daniel Christopher
    CELLULAR ONCOLOGY, 2007, 29 (02) : 120 - 120
  • [38] Estimating enrichment of repetitive elements from high-throughput sequence data
    Day, Daniel S.
    Luquette, Lovelace J.
    Park, Peter J.
    Kharchenko, Peter V.
    GENOME BIOLOGY, 2010, 11 (06):
  • [39] Exploring the effect of conserved motifs on the structure and activity of the RNA-binding protein LARP6c from Arabidopsis thaliana
    Foster, Jessica Sara
    Foster, Corina L.
    Otte-Petrill, Courtney
    Betancourt, Francisco C.
    Lewis, Karen A.
    Antonelli, Cecile
    Bliley, Elodie
    FASEB JOURNAL, 2018, 32 (01):
  • [40] High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder
    Peng, Zhenling
    Kurgan, Lukasz
    NUCLEIC ACIDS RESEARCH, 2015, 43 (18)