Statistical Design and Analysis of RNA Sequencing Data

被引:255
作者
Auer, Paul L. [1 ]
Doerge, R. W. [1 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
基金
美国国家科学基金会;
关键词
GENE-EXPRESSION; SERIAL ANALYSIS; DIFFERENTIAL EXPRESSION; SAGE LIBRARIES; BIOCONDUCTOR; REGRESSION; TESTS; SEQ;
D O I
10.1534/genetics.110.114983
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamental design aspects of data collection and analysis, namely sampling, randomization, replication, and blocking. We discuss these concepts in an RNA sequencing framework. Using simulations we demonstrate the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation. Examples of these designs and their corresponding models are presented with the goal of testing differential expression.
引用
收藏
页码:405 / U32
页数:16
相关论文
共 50 条
  • [1] Agresti A, 2013, Categorical data analysis, V3rd
  • [2] Personalized copy number and segmental duplication maps using next-generation sequencing
    Alkan, Can
    Kidd, Jeffrey M.
    Marques-Bonet, Tomas
    Aksay, Gozde
    Antonacci, Francesca
    Hormozdiari, Fereydoun
    Kitzman, Jacob O.
    Baker, Carl
    Malig, Maika
    Mutlu, Onur
    Sahinalp, S. Cenk
    Gibbs, Richard A.
    Eichler, Evan E.
    [J]. NATURE GENETICS, 2009, 41 (10) : 1061 - U29
  • [3] [Anonymous], 1963, STAT TABLES BIOL AGR
  • [4] [Anonymous], 2006, C&H TEXT STAT SCI, DOI 10.1201/9781315382722
  • [5] The significance of digital gene expression profiles
    Audic, S
    Claverie, JM
    [J]. GENOME RESEARCH, 1997, 7 (10): : 986 - 995
  • [6] Overdispersed logistic regression for SAGE: Modelling multiple groups and covariates
    Baggerly, KA
    Deng, L
    Morris, JS
    Aldaz, CM
    [J]. BMC BIOINFORMATICS, 2004, 5 (1)
  • [7] Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data
    Balwierz, Piotr J.
    Carninci, Piero
    Daub, Carsten O.
    Kawai, Jun
    Hayashizaki, Yoshihide
    Van Belle, Werner
    Beisel, Christian
    van Nimwegen, Erik
    [J]. GENOME BIOLOGY, 2009, 10 (07):
  • [8] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [9] Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays
    Bloom, Joshua S.
    Khan, Zia
    Kruglyak, Leonid
    Singh, Mona
    Caudy, Amy A.
    [J]. BMC GENOMICS, 2009, 10
  • [10] Detection of single nucleotide variations in expressed exons of the human genome using RNA-Seq
    Chepelev, Iouri
    Wei, Gang
    Tang, Qingsong
    Zhao, Keji
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 (16) : e106 - e106