Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression

被引:53
作者
Raghupathy, Narayanan [1 ]
Choi, Kwangbom [1 ]
Vincent, Matthew J. [1 ]
Beane, Glen L. [1 ]
Sheppard, Keith S. [1 ]
Munger, Steven C. [1 ]
Korstanje, Ron [1 ]
Pardo-Manual de Villena, Fernando [2 ]
Churchill, Gary A. [1 ]
机构
[1] Jackson Lab, 600 Main St, Bar Harbor, ME 04609 USA
[2] Univ N Carolina, Dept Genet, Chapel Hill, NC 27514 USA
关键词
ALIGNMENT;
D O I
10.1093/bioinformatics/bty078
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation. Results: Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects.
引用
收藏
页码:2177 / 2184
页数:8
相关论文
共 30 条
  • [1] [Anonymous], 2003, Wiley Series in Probability and Statistics
  • [2] Baker C. L, 2015, PLOS GENET, V11
  • [3] Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016)
    Bray, Nicolas L.
    Pimentel, Harold
    Melsted, Pall
    Pachter, Lior
    [J]. NATURE BIOTECHNOLOGY, 2016, 34 (08) : 888 - 888
  • [4] Tools and best practices for data processing in allelic expression analysis
    Castel, Stephane E.
    Levy-Moonshine, Ami
    Mohammadii, Pejman
    Banks, Eric
    Lappalainenii, Tuuli
    [J]. GENOME BIOLOGY, 2015, 16
  • [5] Defining the consequences of genetic variation on a proteome-wide scale
    Chick, Joel M.
    Munger, Steven C.
    Simecek, Petr
    Huttlin, Edward L.
    Choi, Kwangbom
    Gatti, Daniel M.
    Raghupathy, Narayanan
    Svenson, Karen L.
    Churchill, Gary A.
    Gygi, Steven P.
    [J]. NATURE, 2016, 534 (7608) : 500 - +
  • [6] A survey of best practices for RNA-seq data analysis
    Conesa, Ana
    Madrigal, Pedro
    Tarazona, Sonia
    Gomez-Cabrero, David
    Cervera, Alejandra
    McPherson, Andrew
    Szczesniak, Michal Wojciech
    Gaffney, Daniel J.
    Elo, Laura L.
    Zhang, Xuegong
    Mortazavi, Ali
    [J]. GENOME BIOLOGY, 2016, 17
  • [7] Genomic Imprinting Absent in Drosophila melanogaster Adult Females
    Coolon, Joseph D.
    Stevenson, Kraig R.
    McManus, C. Joel
    Graveley, Brenton R.
    Wittkopp, Patricia J.
    [J]. CELL REPORTS, 2012, 2 (01): : 69 - 75
  • [8] Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data
    Degner, Jacob F.
    Marioni, John C.
    Pai, Athma A.
    Pickrell, Joseph K.
    Nkadori, Everlyne
    Gilad, Yoav
    Pritchard, Jonathan K.
    [J]. BIOINFORMATICS, 2009, 25 (24) : 3207 - 3212
  • [9] Polyester: simulating RNA-seq datasets with differential transcript expression
    Frazee, Alyssa C.
    Jaffe, Andrew E.
    Langmead, Ben
    Leek, Jeffrey T.
    [J]. BIOINFORMATICS, 2015, 31 (17) : 2778 - 2784
  • [10] Modelling and simulating generic RNA-Seq experiments with the flux simulator
    Griebel, Thasso
    Zacher, Benedikt
    Ribeca, Paolo
    Raineri, Emanuele
    Lacroix, Vincent
    Guigo, Roderic
    Sammeth, Michael
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (20) : 10073 - 10083