groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data

被引:42
作者
Chae, Minho [1 ,2 ]
Danko, Charles G. [3 ]
Kraus, W. Lee [1 ,2 ]
机构
[1] Univ Texas SW Med Ctr Dallas, Cecil H & Ida Green Ctr Reprod Biol Sci, Lab Signaling & Gene Regulat, Dallas, TX 75390 USA
[2] Univ Texas SW Med Ctr Dallas, Basic Res Div, Dept Obstet & Gynecol, Dallas, TX 75390 USA
[3] Cornell Univ, James A Baker Inst Anim Hlth, Coll Vet Med, Ithaca, NY 14853 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
GRO-seq; groHMM; Transcription; Transcription unit; Primary transcript; Gene regulation; Peak calling; Cell type specificity; Enhancer; Primary miRNAs; Long non-coding RNAs (lncRNAs); Enhancer RNAs (eRNAs); ChIP-seq; CHIP-SEQ DATA; CIS-REGULATORY MODULES; RECEPTOR BINDING-SITES; RNA-POLYMERASE; ACTIVE ENHANCERS; DISTINCT CLASSES; GENE-EXPRESSION; HUMAN PROMOTERS; NONCODING RNAS; GENOME;
D O I
10.1186/s12859-015-0656-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Global run-on coupled with deep sequencing (GRO-seq) provides extensive information on the location and function of coding and non-coding transcripts, including primary microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and enhancer RNAs (eRNAs), as well as yet undiscovered classes of transcripts. However, few computational tools tailored toward this new type of sequencing data are available, limiting the applicability of GRO-seq data for identifying novel transcription units. Results: Here, we present groHMM, a computational tool in R, which defines the boundaries of transcription units de novo using a two state hidden-Markov model (HMM). A systematic comparison of the performance between groHMM and two existing peak-calling methods tuned to identify broad regions (SICER and HOMER) favorably supports our approach on existing GRO-seq data from MCF-7 breast cancer cells. To demonstrate the broader utility of our approach, we have used groHMM to annotate a diverse array of transcription units (i.e., primary transcripts) from four GRO-seq data sets derived from cells representing a variety of different human tissue types, including non-transformed cells (cardiomyocytes and lung fibroblasts) and transformed cells (LNCaP and MCF-7 cancer cells), as well as non-mammalian cells (from flies and worms). As an example of the utility of groHMM and its application to questions about the transcriptome, we show how groHMM can be used to analyze cell type-specific enhancers as defined by newly annotated enhancer transcripts. Conclusions: Our results show that groHMM can reveal new insights into cell type-specific transcription by identifying novel transcription units, and serve as a complete and useful tool for evaluating functional genomic elements in cells.
引用
收藏
页数:16
相关论文
共 56 条
[1]   Vespucci: a system for building annotated databases of nascent transcripts [J].
Allison, Karmel A. ;
Kaikkonen, Minna U. ;
Gaasterland, Terry ;
Glass, Christopher K. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (04) :2433-2447
[2]   An atlas of active enhancers across human cell types and tissues [J].
Andersson, Robin ;
Gebhard, Claudia ;
Miguel-Escalada, Irene ;
Hoof, Ilka ;
Bornholdt, Jette ;
Boyd, Mette ;
Chen, Yun ;
Zhao, Xiaobei ;
Schmidl, Christian ;
Suzuki, Takahiro ;
Ntini, Evgenia ;
Arner, Erik ;
Valen, Eivind ;
Li, Kang ;
Schwarzfischer, Lucia ;
Glatz, Dagmar ;
Raithel, Johanna ;
Lilje, Berit ;
Rapin, Nicolas ;
Bagger, Frederik Otzen ;
Jorgensen, Mette ;
Andersen, Peter Refsing ;
Bertin, Nicolas ;
Rackham, Owen ;
Burroughs, A. Maxwell ;
Baillie, J. Kenneth ;
Ishizu, Yuri ;
Shimizu, Yuri ;
Furuhata, Erina ;
Maeda, Shiori ;
Negishi, Yutaka ;
Mungall, Christopher J. ;
Meehan, Terrence F. ;
Lassmann, Timo ;
Itoh, Masayoshi ;
Kawaji, Hideya ;
Kondo, Naoto ;
Kawai, Jun ;
Lennartsson, Andreas ;
Daub, Carsten O. ;
Heutink, Peter ;
Hume, David A. ;
Jensen, Torben Heick ;
Suzuki, Harukazu ;
Hayashizaki, Yoshihide ;
Mueller, Ferenc ;
Forrest, Alistair R. R. ;
Carninci, Piero ;
Rehli, Michael ;
Sandelin, Albin .
NATURE, 2014, 507 (7493) :455-+
[3]   Genome-Wide Quantitative Enhancer Activity Maps Identified by STARR-seq [J].
Arnold, Cosmas D. ;
Gerlach, Daniel ;
Stelzer, Christoph ;
Boryn, Lukasz M. ;
Rath, Martina ;
Stark, Alexander .
SCIENCE, 2013, 339 (6123) :1074-1077
[4]   Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome [J].
Berman, BP ;
Nibu, Y ;
Pfeiffer, BD ;
Tomancak, P ;
Celniker, SE ;
Levine, M ;
Rubin, GM ;
Eisen, MB .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (02) :757-762
[5]   Genome-wide analysis of estrogen receptor binding sites [J].
Carroll, Jason S. ;
Meyer, Clifford A. ;
Song, Jun ;
Li, Wei ;
Geistlinger, Timothy R. ;
Eeckhoute, Jerome ;
Brodsky, Alexander S. ;
Keeton, Erika Krasnickas ;
Fertuck, Kirsten C. ;
Hall, Giles F. ;
Wang, Qianben ;
Bekiranov, Stefan ;
Sementchenko, Victor ;
Fox, Edward A. ;
Silver, Pamela A. ;
Gingeras, Thomas R. ;
Liu, X. Shirley ;
Brown, Myles .
NATURE GENETICS, 2006, 38 (11) :1289-1297
[6]   Defining the Status of RNA Polymerase at Promoters [J].
Core, Leighton J. ;
Waterfall, Joshua J. ;
Gilchrist, Daniel A. ;
Fargo, David C. ;
Kwak, Hojoong ;
Adelman, Karen ;
Lis, John T. .
CELL REPORTS, 2012, 2 (04) :1025-1035
[7]   Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters [J].
Core, Leighton J. ;
Waterfall, Joshua J. ;
Lis, John T. .
SCIENCE, 2008, 322 (5909) :1845-1848
[8]  
Danko C.G., 2014, GROHMM GRO SEQ ANAL
[9]  
Dickel DE, 2014, NAT METHODS, V11, P566, DOI [10.1038/NMETH.2886, 10.1038/nmeth.2886]
[10]  
Djebali S., Nature