ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

被引:24
|
作者
Heller, David [1 ,2 ]
Krestel, Ralf [2 ]
Ohler, Uwe [3 ]
Vingron, Martin [1 ]
Marsico, Annalisa [1 ,4 ]
机构
[1] Max Planck Inst Mol Genet, Ihnestr 63-73, D-14195 Berlin, Germany
[2] Hasso Plattner Inst, Prof Dr Helmert Str 2-3, D-14482 Potsdam, Germany
[3] Max Delbruck Ctr, Robert Roessle Str 10, D-13029 Berlin, Germany
[4] Free Univ Berlin, Arnimallee 14, D-14195 Berlin, Germany
关键词
GENE REGULATORY ELEMENTS; SECONDARY STRUCTURE; DNA; DISCOVERY; SITES; CLIP; MICROPROCESSOR; IDENTIFICATION; RECOGNITION; SPECIFICITY;
D O I
10.1093/nar/gkx756
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image.
引用
收藏
页码:11004 / 11018
页数:15
相关论文
共 50 条
  • [21] Linking RNA Sequence, Structure, and Function on Massively Parallel High-Throughput Sequencers
    Denny, Sarah K.
    Greenleaf, William J.
    COLD SPRING HARBOR PERSPECTIVES IN BIOLOGY, 2019, 11 (10):
  • [22] Site identification in high-throughput RNA-protein interaction data
    Uren, Philip J.
    Bahrami-Samani, Emad
    Burns, Suzanne C.
    Qiao, Mei
    Karginov, Fedor V.
    Hodges, Emily
    Hannon, Gregory J.
    Sanford, Jeremy R.
    Penalva, Luiz O. F.
    Smith, Andrew D.
    BIOINFORMATICS, 2012, 28 (23) : 3013 - 3020
  • [23] Comparison of methods for extracting high-throughput sequencing RNA from Korean pine seeds
    Yan Liang
    Hai-long Shen
    Chun-ping Liu
    Ling Yang
    Peng Zhang
    Journal of Forestry Research, 2016, (01) : 33 - 40
  • [24] Comparison of methods for extracting high-throughput sequencing RNA from Korean pine seeds
    Yan Liang
    Hai-long Shen
    Chun-ping Liu
    Ling Yang
    Peng Zhang
    Journal of Forestry Research, 2016, 27 : 33 - 40
  • [25] Comparison of methods for extracting high-throughput sequencing RNA from Korean pine seeds
    Yan Liang
    Hailong Shen
    Chunping Liu
    Ling Yang
    Peng Zhang
    JournalofForestryResearch, 2016, 27 (01) : 33 - 40
  • [26] Comparison of methods for extracting high-throughput sequencing RNA from Korean pine seeds
    Liang, Yan
    Shen, Hai-long
    Liu, Chun-ping
    Yang, Ling
    Zhang, Peng
    JOURNAL OF FORESTRY RESEARCH, 2016, 27 (01) : 33 - 40
  • [27] Molecular Pathways: Extracting Medical Knowledge from High-Throughput Genomic Data
    Goldstein, Theodore C.
    Paull, Evan O.
    Ellis, Matthew J.
    Stuart, Joshua M.
    CLINICAL CANCER RESEARCH, 2013, 19 (12) : 3114 - 3120
  • [28] Phase Separation of the RNA-Binding Protein FUS is Cooperatively Mediated by an SYGQ-Rich Sequence and RGG Motifs
    Murthy, Anastasia
    Fawzi, Nicolas
    PROTEIN SCIENCE, 2018, 27 : 89 - 89
  • [29] Structure alignment-based classification of RNA-binding pockets reveals regional RNA recognition motifs on protein surfaces
    Liu, Zhi-Ping
    Liu, Shutang
    Chen, Ruitang
    Huang, Xiaopeng
    Wu, Ling-Yun
    BMC BIOINFORMATICS, 2017, 18
  • [30] Structure alignment-based classification of RNA-binding pockets reveals regional RNA recognition motifs on protein surfaces
    Zhi-Ping Liu
    Shutang Liu
    Ruitang Chen
    Xiaopeng Huang
    Ling-Yun Wu
    BMC Bioinformatics, 18