Accurate quantification of transcriptome from RNA-Seq data by effective length normalization

被引:77
作者
Lee, Soohyun [1 ]
Seo, Chae Hwa [1 ]
Lim, Byungho [2 ]
Yang, Jin Ok [1 ]
Oh, Jeongsu [1 ]
Kim, Minjin [2 ]
Lee, Sooncheol [2 ]
Lee, Byungwook [1 ]
Kang, Changwon [2 ]
Lee, Sanghyuk [1 ,3 ]
机构
[1] KRIBB, Korean Bioinformat Ctr KOBIC, 111 Gwahangno, Taejon 305806, South Korea
[2] Korea Adv Inst Sci & Technol, Dept Biol Sci, Taejon 305701, South Korea
[3] Ewha Womans Univ, Div Life & Pharmaceut Sci, ERCSB, Seoul 120750, South Korea
关键词
EXPRESSION;
D O I
10.1093/nar/gkq1015
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We propose a novel, efficient and intuitive approach of estimating mRNA abundances from the whole transcriptome shotgun sequencing (RNA-Seq) data. Our method, NEUMA (Normalization by Expected Uniquely Mappable Area), is based on effective length normalization using uniquely mappable areas of gene and mRNA isoform models. Using the known transcriptome sequence model such as RefSeq, NEUMA pre-computes the numbers of all possible gene-wise and isoform-wise informative reads: the former being sequences mapped to all mRNA isoforms of a single gene exclusively and the latter uniquely mapped to a single mRNA isoform. The results are used to estimate the effective length of genes and transcripts, taking experimental distributions of fragment size into consideration. Quantitative RT-PCR based on 27 randomly selected genes in two human cell lines and computer simulation experiments demonstrated superior accuracy of NEUMA over other recently developed methods. NEUMA covers a large proportion of genes and mRNA isoforms and offers a measure of consistency ('consistency coefficient') for each gene between an independently measured gene-wise level and the sum of the isoform levels. NEUMA is applicable to both paired-end and single-end RNA-Seq data. We propose that NEUMA could make a standard method in quantifying gene transcript levels from RNA-Seq data.
引用
收藏
页数:10
相关论文
共 15 条
  • [1] Transcript quantification with RNA-Seq data
    Bohnert, Regina
    Behr, Jonas
    Raetsch, Gunnar
    [J]. BMC BIOINFORMATICS, 2009, 10 : P5
  • [2] Towards reliable isoform quantification using RNA-SEQ data
    Howard, Brian E.
    Heber, Steffen
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [3] Statistical inferences for isoform expression in RNA-Seq
    Jiang, Hui
    Wong, Wing Hung
    [J]. BIOINFORMATICS, 2009, 25 (08) : 1026 - 1032
  • [4] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    Langmead, Ben
    Trapnell, Cole
    Pop, Mihai
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2009, 10 (03):
  • [5] RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays
    Marioni, John C.
    Mason, Christopher E.
    Mane, Shrikant M.
    Stephens, Matthew
    Gilad, Yoav
    [J]. GENOME RESEARCH, 2008, 18 (09) : 1509 - 1517
  • [6] Mapping and quantifying mammalian transcriptomes by RNA-Seq
    Mortazavi, Ali
    Williams, Brian A.
    McCue, Kenneth
    Schaeffer, Lorian
    Wold, Barbara
    [J]. NATURE METHODS, 2008, 5 (07) : 621 - 628
  • [7] The transcriptional landscape of the yeast genome defined by RNA sequencing
    Nagalakshmi, Ugrappa
    Wang, Zhong
    Waern, Karl
    Shou, Chong
    Raha, Debasish
    Gerstein, Mark
    Snyder, Michael
    [J]. SCIENCE, 2008, 320 (5881) : 1344 - 1349
  • [8] An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence Data
    Ramskold, Daniel
    Wang, Eric T.
    Burge, Christopher B.
    Sandberg, Rickard
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2009, 5 (12)
  • [9] Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments
    Richard, Hugues
    Schulz, Marcel H.
    Sultan, Marc
    Nuernberger, Asja
    Schrinner, Sabine
    Balzereit, Daniela
    Dagand, Emilie
    Rasche, Axel
    Lehrach, Hans
    Vingron, Martin
    Haas, Stefan A.
    Yaspo, Marie-Laure
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (10) : e112
  • [10] A two-parameter generalized Poisson model to improve the analysis of RNA-seq data
    Srivastava, Sudeep
    Chen, Liang
    [J]. NUCLEIC ACIDS RESEARCH, 2010, 38 (17) : e170 - e170