Normalization of RNA-seq data using factor analysis of control genes or samples

被引:1238
作者
Risso, Davide [1 ]
Ngai, John [2 ,3 ,4 ]
Speed, Terence P. [1 ,5 ,6 ]
Dudoit, Sandrine [1 ,7 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Helen Wills Neurosci Inst, Berkeley, CA 94720 USA
[4] Univ Calif Berkeley, Funct Genom Lab, Berkeley, CA 94720 USA
[5] Royal Melbourne Hosp, Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic 3050, Australia
[6] Univ Melbourne, Dept Math & Stat, Melbourne, Vic 3010, Australia
[7] Univ Calif Berkeley, Div Biostat, Berkeley, CA 94720 USA
基金
英国医学研究理事会;
关键词
LOCALLY WEIGHTED REGRESSION; MESSENGER-RNA; DIFFERENTIAL EXPRESSION; MICROARRAY DATA; SINGLE;
D O I
10.1038/nbt.2931
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Normalization of RNA-sequencing (RNA-seq) data has proven essential to ensure accurate inference of expression levels. Here, we show that usual normalization approaches mostly account for sequencing depth and fail to correct for library preparation and other more complex unwanted technical effects. We evaluate the performance of the External RNA Control Consortium (ERCC) spike-in controls and investigate the possibility of using them directly for normalization. We show that the spike-ins are not reliable enough to be used in standard global-scaling or regression-based normalization procedures. We propose a normalization strategy, called remove unwanted variation (RUV), that adjusts for nuisance technical effects by performing factor analysis on suitable sets of control genes (e.g., ERCC spike-ins) or samples (e.g., replicate libraries). Our approach leads to more accurate estimates of expression fold-changes and tests of differential expression compared to state-of-the-art normalization methods. In particular, RUV promises to be valuable for large collaborative projects involving multiple laboratories, technicians, and/or sequencing platforms.
引用
收藏
页码:896 / 902
页数:7
相关论文
共 29 条
  • [1] The external RNA controls consortium: a progress report
    Baker, SC
    Bauer, SR
    Beyer, RP
    Brenton, JD
    Bromley, B
    Burrill, J
    Causton, H
    Conley, MP
    Elespuru, R
    Fero, M
    Foy, C
    Fuscoe, J
    Gao, XL
    Gerhold, DL
    Gilles, P
    Goodsaid, F
    Guo, X
    Hackett, J
    Hockett, RD
    Ikonomi, P
    Irizarry, RA
    Kawasaki, ES
    Kaysser-Kranich, T
    Kerr, K
    Kiser, G
    Koch, WH
    Lee, KY
    Liu, CM
    Liu, ZL
    Lucas, A
    Manohar, CF
    Miyada, G
    Modrusan, Z
    Parkes, H
    Puri, RK
    Reid, L
    Ryder, TB
    Salit, M
    Samaha, RR
    Scherf, U
    Sendera, TJ
    Setterquist, RA
    Shi, LM
    Shippy, R
    Soriano, JV
    Wagar, EA
    Warrington, JA
    Williams, M
    Wilmer, F
    Wilson, M
    [J]. NATURE METHODS, 2005, 2 (10) : 731 - 734
  • [2] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [3] Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/NMETH.2645, 10.1038/nmeth.2645]
  • [4] Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments
    Bullard, James H.
    Purdom, Elizabeth
    Hansen, Kasper D.
    Dudoit, Sandrine
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [5] Evaluation of DNA microarray results with quantitative gene expression platforms
    Canales, Roger D.
    Luo, Yuling
    Willey, James C.
    Austermiller, Bradley
    Barbacioru, Catalin C.
    Boysen, Cecilie
    Hunkapiller, Kathryn
    Jensen, Roderick V.
    Knight, Charles R.
    Lee, Kathleen Y.
    Ma, Yunqing
    Maqsodi, Botoul
    Papallo, Adam
    Peters, Elizabeth Herness
    Poulter, Karen
    Ruppel, Patricia L.
    Samaha, Raymond R.
    Shi, Leming
    Yang, Wen
    Zhang, Lu
    Goodsaid, Federico M.
    [J]. NATURE BIOTECHNOLOGY, 2006, 24 (09) : 1115 - 1122
  • [6] Comprehensive genomic characterization defines human glioblastoma genes and core pathways
    Chin, L.
    Meyerson, M.
    Aldape, K.
    Bigner, D.
    Mikkelsen, T.
    VandenBerg, S.
    Kahn, A.
    Penny, R.
    Ferguson, M. L.
    Gerhard, D. S.
    Getz, G.
    Brennan, C.
    Taylor, B. S.
    Winckler, W.
    Park, P.
    Ladanyi, M.
    Hoadley, K. A.
    Verhaak, R. G. W.
    Hayes, D. N.
    Spellman, Paul T.
    Absher, D.
    Weir, B. A.
    Ding, L.
    Wheeler, D.
    Lawrence, M. S.
    Cibulskis, K.
    Mardis, E.
    Zhang, Jinghui
    Wilson, R. K.
    Donehower, L.
    Wheeler, D. A.
    Purdom, E.
    Wallis, J.
    Laird, P. W.
    Herman, J. G.
    Schuebel, K. E.
    Weisenberger, D. J.
    Baylin, S. B.
    Schultz, N.
    Yao, Jun
    Wiedemeyer, R.
    Weinstein, J.
    Sander, C.
    Gibbs, R. A.
    Gray, J.
    Kucherlapati, R.
    Lander, E. S.
    Myers, R. M.
    Perou, C. M.
    McLendon, Roger
    [J]. NATURE, 2008, 455 (7216) : 1061 - 1068
  • [7] ROBUST LOCALLY WEIGHTED REGRESSION AND SMOOTHING SCATTERPLOTS
    CLEVELAND, WS
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1979, 74 (368) : 829 - 836
  • [8] LOCALLY WEIGHTED REGRESSION - AN APPROACH TO REGRESSION-ANALYSIS BY LOCAL FITTING
    CLEVELAND, WS
    DEVLIN, SJ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1988, 83 (403) : 596 - 610
  • [9] A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
    Dillies, Marie-Agnes
    Rau, Andrea
    Aubert, Julie
    Hennequet-Antier, Christelle
    Jeanmougin, Marine
    Servant, Nicolas
    Keime, Celine
    Marot, Guillemette
    Castel, David
    Estelle, Jordi
    Guernec, Gregory
    Jagla, Bernd
    Jouneau, Luc
    Laloe, Denis
    Le Gall, Caroline
    Schaeffer, Brigitte
    Le Crom, Stephane
    Guedj, Mickael
    Jaffrezic, Florence
    [J]. BRIEFINGS IN BIOINFORMATICS, 2013, 14 (06) : 671 - 683
  • [10] The ENCODE (ENCyclopedia of DNA elements) Project
    Feingold, EA
    Good, PJ
    Guyer, MS
    Kamholz, S
    Liefer, L
    Wetterstrand, K
    Collins, FS
    Gingeras, TR
    Kampa, D
    Sekinger, EA
    Cheng, J
    Hirsch, H
    Ghosh, S
    Zhu, Z
    Pate, S
    Piccolboni, A
    Yang, A
    Tammana, H
    Bekiranov, S
    Kapranov, P
    Harrison, R
    Church, G
    Struhl, K
    Ren, B
    Kim, TH
    Barrera, LO
    Qu, C
    Van Calcar, S
    Luna, R
    Glass, CK
    Rosenfeld, MG
    Guigo, R
    Antonarakis, SE
    Birney, E
    Brent, M
    Pachter, L
    Reymond, A
    Dermitzakis, ET
    Dewey, C
    Keefe, D
    Denoeud, F
    Lagarde, J
    Ashurst, J
    Hubbard, T
    Wesselink, JJ
    Castelo, R
    Eyras, E
    Myers, RM
    Sidow, A
    Batzoglou, S
    [J]. SCIENCE, 2004, 306 (5696) : 636 - 640