Improved definition of the mouse transcriptome via targeted RNA sequencing

被引:23
作者
Bussotti, Giovanni [1 ,7 ]
Leonardi, Tommaso [1 ,2 ]
Clark, Michael B. [2 ,3 ]
Mercer, Tim R. [4 ]
Crawford, Joanna [5 ]
Malquori, Lorenzo [5 ]
Notredame, Cedric [6 ]
Dinger, Marcel E. [2 ,4 ]
Mattick, John S. [2 ,4 ]
Enright, Anton J. [1 ]
机构
[1] EMBL, European Bioinformat Inst, Cambridge CB10 1SD, England
[2] Garvan Inst Med Res, Sydney, NSW 2010, Australia
[3] Univ Oxford, Dept Physiol Anat & Genet, MRC Funct Genom Unit, Oxford OX1 3PT, England
[4] UNSW Australia, St Vincents Clin Sch, Sydney, NSW 2052, Australia
[5] Univ Queensland, Inst Mol Biosci, Brisbane, Qld 4072, Australia
[6] CRG, Comparat Bioinformat Bioinformat & Genom Program, Barcelona 08003, Spain
[7] Inst Pasteur, Hub Bioinformat & Biostat, C3BI, F-75724 Paris 15, France
基金
英国医学研究理事会; 澳大利亚国家健康与医学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
LONG NONCODING RNAS; LINKED MENTAL-RETARDATION; GENOME ANNOTATION; TM4SF2; GENE; SEQ DATA; REVEALS; ALIGNMENT; RECONSTRUCTION; DISCOVERY; BROWSER;
D O I
10.1101/gr.199760.115
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Targeted RNA sequencing (CaptureSeq) uses oligonucleotide probes to capture RNAs for sequencing, providing enriched read coverage, accurate measurement of gene expression, and quantitative expression data. We applied CaptureSeq to refine transcript annotations in the current murine GRCm38 assembly. More than 23,000 regions corresponding to putative or annotated long noncoding RNAs (lncRNAs) and 154,281 known splicing junction sites were selected for targeted sequencing across five mouse tissues and three brain subregions. The results illustrate that the mouse transcriptome is considerably more complex than previously thought. We assemble more complete transcript isoforms than GENCODE, expand transcript boundaries, and connect interspersed islands of mapped reads. We describe a novel filtering pipeline that identifies previously unannotated but high-quality transcript isoforms. In this set, 911 GENCODE neighboring genes are condensed into 400 expanded gene models. Additionally, 594 GENCODE lncRNAs acquire an open reading frame (ORF) when their structure is extended with CaptureSeq. Finally, we validate our observations using current FANTOM and Mouse ENCODE resources.
引用
收藏
页码:705 / 716
页数:12
相关论文
共 80 条
[1]   A novel 2 bp deletion in the TM4SF2 gene is associated with MRX58 [J].
Abidi, FE ;
Holinski-Feder, E ;
Rittinger, O ;
Kooy, F ;
Lubs, HA ;
Stevenson, RE ;
Schwartz, CE .
JOURNAL OF MEDICAL GENETICS, 2002, 39 (06) :430-433
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]   lncRNAdb: a reference database for long noncoding RNAs [J].
Amaral, Paulo P. ;
Clark, Michael B. ;
Gascoigne, Dennis K. ;
Dinger, Marcel E. ;
Mattick, John S. .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D146-D151
[4]   Complex architecture and regulated expression of the Sox2ot locus during vertebrate development [J].
Amaral, Paulo P. ;
Neyt, Christine ;
Wilkins, Simon J. ;
Askarian-Amiri, Marjan E. ;
Sunkin, Susan M. ;
Perkins, Andrew C. ;
Mattick, John S. .
RNA, 2009, 15 (11) :2013-2027
[5]   HTSeq-a Python']Python framework to work with high-throughput sequencing data [J].
Anders, Simon ;
Pyl, Paul Theodor ;
Huber, Wolfgang .
BIOINFORMATICS, 2015, 31 (02) :166-169
[6]   The external RNA controls consortium: a progress report [J].
Baker, SC ;
Bauer, SR ;
Beyer, RP ;
Brenton, JD ;
Bromley, B ;
Burrill, J ;
Causton, H ;
Conley, MP ;
Elespuru, R ;
Fero, M ;
Foy, C ;
Fuscoe, J ;
Gao, XL ;
Gerhold, DL ;
Gilles, P ;
Goodsaid, F ;
Guo, X ;
Hackett, J ;
Hockett, RD ;
Ikonomi, P ;
Irizarry, RA ;
Kawasaki, ES ;
Kaysser-Kranich, T ;
Kerr, K ;
Kiser, G ;
Koch, WH ;
Lee, KY ;
Liu, CM ;
Liu, ZL ;
Lucas, A ;
Manohar, CF ;
Miyada, G ;
Modrusan, Z ;
Parkes, H ;
Puri, RK ;
Reid, L ;
Ryder, TB ;
Salit, M ;
Samaha, RR ;
Scherf, U ;
Sendera, TJ ;
Setterquist, RA ;
Shi, LM ;
Shippy, R ;
Soriano, JV ;
Wagar, EA ;
Warrington, JA ;
Williams, M ;
Wilmer, F ;
Wilson, M .
NATURE METHODS, 2005, 2 (10) :731-734
[7]   A Transcriptomic Atlas of Mouse Neocortical Layers [J].
Belgard, T. Grant ;
Marques, Ana C. ;
Oliver, Peter L. ;
Abaan, Hatice Ozel ;
Sirey, Tamara M. ;
Hoerder-Suabedissen, Anna ;
Garcia-Moreno, Fernando ;
Molnar, Zoltan ;
Margulies, Elliott H. ;
Ponting, Chris P. .
NEURON, 2011, 71 (04) :605-616
[8]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[9]   GenBank [J].
Benson, Dennis A. ;
Clark, Karen ;
Karsch-Mizrachi, Ilene ;
Lipman, David J. ;
Ostell, James ;
Sayers, Eric W. .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D32-D37
[10]  
Bryant Douglas W Jr, 2012, Methods Mol Biol, V883, P97, DOI 10.1007/978-1-61779-839-9_7