Accurate Estimation of Expression Levels of Homologous Genes in RNA-seq Experiments

被引:18
作者
Pasaniuc, Bogdan [1 ,2 ]
Zaitlen, Noah [1 ,2 ]
Halperin, Eran [3 ,4 ,5 ]
机构
[1] Harvard Univ, Sch Publ Hlth, Dept Epidemiol, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
[4] Tel Aviv Univ, Mol Microbiol & Biotechnol Dept, IL-69978 Tel Aviv, Israel
[5] Tel Aviv Univ, Blavatnik Sch Comp Sci, IL-69978 Tel Aviv, Israel
基金
美国国家科学基金会; 以色列科学基金会;
关键词
algorithms; gene searching; genetic mapping; genetic variation; TRANSCRIPTOMES; REVEALS; GENOME; MOUSE;
D O I
10.1089/cmb.2010.0259
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Next generation high-throughput sequencing (NGS) is poised to replace array-based technologies as the experiment of choice for measuring RNA expression levels. Several groups have demonstrated the power of this new approach (RNA-seq), making significant and novel contributions and simultaneously proposing methodologies for the analysis of RNA-seq data. In a typical experiment, millions of short sequences (reads) are sampled from RNA extracts and mapped back to a reference genome. The number of reads mapping to each gene is used as proxy for its corresponding RNA concentration. A significant challenge in analyzing RNA expression of homologous genes is the large fraction of the reads that map to multiple locations in the reference genome. Currently, these reads are either dropped from the analysis, or a naive algorithm is used to estimate their underlying distribution. In this work, we present a rigorous alternative for handling the reads generated in an RNA-seq experiment within a probabilistic model for RNA-seq data; we develop maximum likelihood-based methods for estimating the model parameters. In contrast to previous methods, our model takes into account the fact that the DNA of the sequenced individual is not a perfect copy of the reference sequence. We show with both simulated and real RNA-seq data that our new method improves the accuracy and power of RNA-seq experiments.
引用
收藏
页码:459 / 468
页数:10
相关论文
共 50 条
  • [41] RNA-Seq Analysis of Differential Gene Expression in Electroporated Chick Embryonic Spinal Cord
    Vieceli, Felipe M.
    Yan, C. Y. Irene
    JOVE-JOURNAL OF VISUALIZED EXPERIMENTS, 2014, (93):
  • [42] Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation
    Li, Jingyi Jessica
    Jiang, Ci-Ren
    Brown, James B.
    Huang, Haiyan
    Bickel, Peter J.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (50) : 19867 - 19872
  • [43] DREAMSeq: An Improved Method for Analyzing Differentially Expressed Genes in RNA-seq Data
    Gao, Zhihua
    Zhao, Zhiying
    Tang, Wenqiang
    FRONTIERS IN GENETICS, 2018, 9
  • [44] RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
    Li, Bo
    Dewey, Colin N.
    BMC BIOINFORMATICS, 2011, 12
  • [45] Critical Evaluation of Imprinted Gene Expression by RNA-Seq: A New Perspective
    DeVeale, Brian
    van der Kooy, Derek
    Babak, Tomas
    PLOS GENETICS, 2012, 8 (03):
  • [46] Accurate assembly of multi-end RNA-seq data with Scallop2
    Zhang, Qimin
    Shi, Qian
    Shao, Mingfu
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (03): : 148 - +
  • [47] A Clustering Approach to Identify Candidates to Housekeeping Genes Based on RNA-seq Data
    Franco, Edian F.
    Maues, Dener
    Alves, Ronnie
    Guimaraes, Luis
    Azevedo, Vasco
    Silva, Artur
    Ghosh, Preetam
    Morais, Jefferson
    Ramos, Rommel T. J.
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2019, 2020, 11347 : 83 - 95
  • [48] Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data
    Wei, Nana
    Nie, Yating
    Liu, Lin
    Zheng, Xiaoqi
    Wu, Hua-Jun
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [49] Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments
    Vijay, Nagarjun
    Poelstra, Jelmer W.
    Kuenstner, Axel
    Wolf, Jochen B. W.
    MOLECULAR ECOLOGY, 2013, 22 (03) : 620 - 634
  • [50] Integration of ATAC-Seq and RNA-Seq Analysis to Identify Key Genes in the Longissimus Dorsi Muscle Development of the Tianzhu White Yak
    Li, Jingsheng
    Chen, Zongchang
    Bai, Yanbin
    Wei, Yali
    Guo, Dashan
    Liu, Zhanxin
    Niu, Yanmei
    Shi, Bingang
    Zhang, Xiaolan
    Cai, Yuan
    Zhao, Zhidong
    Hu, Jiang
    Wang, Jiqing
    Liu, Xiu
    Li, Shaobin
    Zhao, Fangfang
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (01)