A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

被引:652
作者
Su, Zhenqiang [1 ]
Labaj, Pawel P. [2 ]
Li, Sheng [3 ,4 ]
Thierry-Mieg, Jean [5 ]
Thierry-Mieg, Danielle [5 ]
Shi, Wei [6 ,7 ]
Wang, Charles [8 ,9 ]
Schroth, Gary P. [10 ]
Setterquist, Robert A. [11 ]
Thompson, John F. [12 ]
Jones, Wendell D. [13 ]
Xiao, Wenzhong [14 ,15 ]
Xu, Weihong [15 ]
Jensen, Roderick V. [16 ]
Kelly, Reagan [1 ]
Xu, Joshua [1 ]
Conesa, Ana [17 ]
Furlanello, Cesare [18 ]
Gao, Hanlin [19 ]
Hong, Huixiao [1 ]
Jafari, Nadereh [20 ]
Letovsky, Stan [21 ]
Liao, Yang [6 ,22 ]
Lu, Fei [23 ]
Oakeley, Edward J. [24 ]
Peng, Zhiyu [25 ]
Praul, Craig A. [26 ]
Santoyo-Lopez, Javier [27 ,28 ]
Scherer, Andreas [29 ,30 ]
Shi, Tieliu [31 ,32 ]
Smyth, Gordon K. [6 ,33 ]
Staedtler, Frank [24 ]
Sykacek, Peter [2 ]
Tan, Xin-Xing [23 ]
Thompson, E. Aubrey [34 ]
Vandesompele, Jo [35 ]
Wang, May D. [36 ,37 ]
Wang, Jian [38 ]
Wolfinger, Russell D. [39 ]
Zavadil, Jiri [40 ,41 ,42 ]
Auerbach, Scott S. [43 ]
Bao, Wenjun [39 ]
Binder, Hans [44 ]
Blomquist, Thomas [45 ]
Brilliant, Murray H. [46 ]
Bushel, Pierre R. [43 ]
Cain, Weimin [47 ]
Catalano, Jennifer G. [48 ]
Chang, Ching-Wei [1 ]
Chen, Tao [1 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] Boku Univ Vienna, Chair Bioinformat Res Grp, Vienna, Austria
[3] Weill Cornell Med Coll, Dept Physiol & Biophys, New York, NY USA
[4] Weill Cornell Med Coll, HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsau, New York, NY USA
[5] NCBI, NIH, Bethesda, MD USA
[6] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic, Australia
[7] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic 3052, Australia
[8] Loma Linda Univ, Ctr Genom, Sch Med, Loma Linda, CA 92350 USA
[9] Loma Linda Univ, Div Microbiol & Mol Genet, Sch Med, Loma Linda, CA 92350 USA
[10] Illumina Inc, Hayward, CA USA
[11] Life Technol Corp, Austin, TX USA
[12] Claritas Genom, Cambridge, MA USA
[13] Express Anal Inc, Durham, NC USA
[14] Harvard Univ, Massachusetts Gen Hosp, Sch Med, Boston, MA USA
[15] Stanford Genome Technol Ctr, Palo Alto, CA USA
[16] Virginia Tech, Dept Biol Sci, Blacksburg, VA USA
[17] Ctr Invest Principe Felipe, Computat Genom Program, Valencia, Spain
[18] Fdn Bruno Kessler, Trento, Trento, Italy
[19] City Hope Natl Med Ctr, DNA Sequencing Solexa Core, Beckman Res Inst, City Hope Comprehens Canc Ctr, Duarte, CA 91010 USA
[20] Northwestern Univ, Ctr Genet Med, Feinberg Sch Med, Chicago, IL 60611 USA
[21] SynapDx Corp, Lexington, MA USA
[22] Univ Melbourne, Dept Med Biol, Parkville, Vic 3052, Australia
[23] GE Healthcare SeqWright Genom Serv, Houston, TX USA
[24] Novartis Inst Biomed Res, Basel, Switzerland
[25] BGI Shenzhen, Bei Shan Ind Zone, Shenzhen, Guangdong, Peoples R China
[26] Penn State Univ, University Pk, PA 16802 USA
[27] Genom & Bioinformat Platform Andalusia, Med Genome Project, Seville, Spain
[28] Univ Edinburgh, Edinburgh Genom, Edinburgh, Midlothian, Scotland
[29] Australian Genome Res Facil Ltd, Walter & Eliza Hall Inst Med Res, Parkville, Vic, Australia
[30] Spheromics, Kontiolahti, Finland
[31] E China Normal Univ, Inst Biomed Sci, Ctr Bioinformat & Computat Biol, Shanghai Key Lab Regulatory Biol, Shanghai 200062, Peoples R China
[32] E China Normal Univ, Sch Life Sci, Shanghai 200062, Peoples R China
[33] Univ Melbourne, Dept Math & Stat, Parkville, Vic 3052, Australia
[34] Mayo Clin Jacksonville, Dept Canc Biol, Jacksonville, FL 32224 USA
[35] Biogazelle, Zwijnaarde, Belgium
[36] GeorgiaTech, Dept Biomed Engn, Atlanta, GA USA
[37] Emory Univ, Atlanta, GA 30322 USA
[38] Eli Lilly & Co, Lilly Corp Ctr, Res Informat, Indianapolis, IN 46285 USA
[39] SAS Inst Inc, Cary, NC USA
[40] NYU, Langone Med Ctr, NYU Genome Technol Ctr, New York, NY USA
[41] NYU, Langone Med Ctr, NYU Ctr Hlth Informat & Bioinformat, Dept Pathol, New York, NY USA
[42] Int Agcy Res Canc, Mol Mech & Biomarkers Grp, F-69372 Lyon, France
[43] NIEHS, NIH, Res Triangle Pk, NC 27709 USA
[44] Univ Leipzig, Interdisciplinary Ctr Bioinformat, D-04109 Leipzig, Germany
[45] Univ Toledo, Div Pulm & Crit Care Med, Dept Med, Med Coll Ohio, Toledo, OH 43606 USA
[46] Marshfield Clin Res Fdn, Ctr Human Genet, Marshfield, WI USA
[47] Fudan Univ, Sch Pharm, Shanghai 200433, Peoples R China
[48] US FDA, Off Cellular Tissue & Gene Therapies, CBER, Bethesda, MD 20014 USA
[49] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, Icahn Inst Genom & Multiscale Biol, New York, NY 10029 USA
[50] Johannes Kepler Univ Linz, Inst Bioinformat, A-4040 Linz, Austria
基金
英国医学研究理事会; 中国国家自然科学基金; 英国生物技术与生命科学研究理事会; 美国国家卫生研究院;
关键词
DIFFERENTIAL GENE-EXPRESSION; GENOME ANNOTATION; TRANSCRIPTOME; ARRAYS; BIAS; PCR;
D O I
10.1038/nbt.2957
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific-filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
引用
收藏
页码:903 / 914
页数:12
相关论文
共 46 条
  • [1] Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays
    Agarwal, Ashish
    Koppstein, David
    Rozowsky, Joel
    Sboner, Andrea
    Habegger, Lukas
    Hillier, LaDeana W.
    Sasidharan, Rajkumar
    Reinke, Valerie
    Waterston, Robert H.
    Gerstein, Mark
    [J]. BMC GENOMICS, 2010, 11
  • [2] Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries
    Aird, Daniel
    Ross, Michael G.
    Chen, Wei-Sheng
    Danielsson, Maxwell
    Fennell, Timothy
    Russ, Carsten
    Jaffe, David B.
    Nusbaum, Chad
    Gnirke, Andreas
    [J]. GENOME BIOLOGY, 2011, 12 (02)
  • [3] The external RNA controls consortium: a progress report
    Baker, SC
    Bauer, SR
    Beyer, RP
    Brenton, JD
    Bromley, B
    Burrill, J
    Causton, H
    Conley, MP
    Elespuru, R
    Fero, M
    Foy, C
    Fuscoe, J
    Gao, XL
    Gerhold, DL
    Gilles, P
    Goodsaid, F
    Guo, X
    Hackett, J
    Hockett, RD
    Ikonomi, P
    Irizarry, RA
    Kawasaki, ES
    Kaysser-Kranich, T
    Kerr, K
    Kiser, G
    Koch, WH
    Lee, KY
    Liu, CM
    Liu, ZL
    Lucas, A
    Manohar, CF
    Miyada, G
    Modrusan, Z
    Parkes, H
    Puri, RK
    Reid, L
    Ryder, TB
    Salit, M
    Samaha, RR
    Scherf, U
    Sendera, TJ
    Setterquist, RA
    Shi, LM
    Shippy, R
    Soriano, JV
    Wagar, EA
    Warrington, JA
    Williams, M
    Wilmer, F
    Wilson, M
    [J]. NATURE METHODS, 2005, 2 (10) : 731 - 734
  • [4] Summarizing and correcting the GC content bias in high-throughput sequencing
    Benjamini, Yuval
    Speed, Terence P.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (10) : e72
  • [5] Comprehensive genomic characterization defines human glioblastoma genes and core pathways
    Chin, L.
    Meyerson, M.
    Aldape, K.
    Bigner, D.
    Mikkelsen, T.
    VandenBerg, S.
    Kahn, A.
    Penny, R.
    Ferguson, M. L.
    Gerhard, D. S.
    Getz, G.
    Brennan, C.
    Taylor, B. S.
    Winckler, W.
    Park, P.
    Ladanyi, M.
    Hoadley, K. A.
    Verhaak, R. G. W.
    Hayes, D. N.
    Spellman, Paul T.
    Absher, D.
    Weir, B. A.
    Ding, L.
    Wheeler, D.
    Lawrence, M. S.
    Cibulskis, K.
    Mardis, E.
    Zhang, Jinghui
    Wilson, R. K.
    Donehower, L.
    Wheeler, D. A.
    Purdom, E.
    Wallis, J.
    Laird, P. W.
    Herman, J. G.
    Schuebel, K. E.
    Weisenberger, D. J.
    Baylin, S. B.
    Schultz, N.
    Yao, Jun
    Wiedemeyer, R.
    Weinstein, J.
    Sander, C.
    Gibbs, R. A.
    Gray, J.
    Kucherlapati, R.
    Lander, E. S.
    Myers, R. M.
    Perou, C. M.
    McLendon, Roger
    [J]. NATURE, 2008, 455 (7216) : 1061 - 1068
  • [6] Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data
    Dai, MH
    Wang, PL
    Boyd, AD
    Kostov, G
    Athey, B
    Jones, EG
    Bunney, WE
    Myers, RM
    Speed, TP
    Akil, H
    Watson, SJ
    Meng, F
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (20) : e175.1 - e175.9
  • [7] Landscape of transcription in human cells
    Djebali, Sarah
    Davis, Carrie A.
    Merkel, Angelika
    Dobin, Alex
    Lassmann, Timo
    Mortazavi, Ali
    Tanzer, Andrea
    Lagarde, Julien
    Lin, Wei
    Schlesinger, Felix
    Xue, Chenghai
    Marinov, Georgi K.
    Khatun, Jainab
    Williams, Brian A.
    Zaleski, Chris
    Rozowsky, Joel
    Roeder, Maik
    Kokocinski, Felix
    Abdelhamid, Rehab F.
    Alioto, Tyler
    Antoshechkin, Igor
    Baer, Michael T.
    Bar, Nadav S.
    Batut, Philippe
    Bell, Kimberly
    Bell, Ian
    Chakrabortty, Sudipto
    Chen, Xian
    Chrast, Jacqueline
    Curado, Joao
    Derrien, Thomas
    Drenkow, Jorg
    Dumais, Erica
    Dumais, Jacqueline
    Duttagupta, Radha
    Falconnet, Emilie
    Fastuca, Meagan
    Fejes-Toth, Kata
    Ferreira, Pedro
    Foissac, Sylvain
    Fullwood, Melissa J.
    Gao, Hui
    Gonzalez, David
    Gordon, Assaf
    Gunawardena, Harsha
    Howald, Cedric
    Jha, Sonali
    Johnson, Rory
    Kapranov, Philipp
    King, Brandon
    [J]. NATURE, 2012, 489 (7414) : 101 - 108
  • [8] STAR: ultrafast universal RNA-seq aligner
    Dobin, Alexander
    Davis, Carrie A.
    Schlesinger, Felix
    Drenkow, Jorg
    Zaleski, Chris
    Jha, Sonali
    Batut, Philippe
    Chaisson, Mark
    Gingeras, Thomas R.
    [J]. BIOINFORMATICS, 2013, 29 (01) : 15 - 21
  • [9] G-stack modulated probe intensities on expression arrays - sequence corrections and signal calibration
    Fasold, Mario
    Stadler, Peter F.
    Binder, Hans
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [10] GENCODE: The reference human genome annotation for The ENCODE Project
    Harrow, Jennifer
    Frankish, Adam
    Gonzalez, Jose M.
    Tapanari, Electra
    Diekhans, Mark
    Kokocinski, Felix
    Aken, Bronwen L.
    Barrell, Daniel
    Zadissa, Amonida
    Searle, Stephen
    Barnes, If
    Bignell, Alexandra
    Boychenko, Veronika
    Hunt, Toby
    Kay, Mike
    Mukherjee, Gaurab
    Rajan, Jeena
    Despacio-Reyes, Gloria
    Saunders, Gary
    Steward, Charles
    Harte, Rachel
    Lin, Michael
    Howald, Cedric
    Tanzer, Andrea
    Derrien, Thomas
    Chrast, Jacqueline
    Walters, Nathalie
    Balasubramanian, Suganthi
    Pei, Baikang
    Tress, Michael
    Manuel Rodriguez, Jose
    Ezkurdia, Iakes
    van Baren, Jeltje
    Brent, Michael
    Haussler, David
    Kellis, Manolis
    Valencia, Alfonso
    Reymond, Alexandre
    Gerstein, Mark
    Guigo, Roderic
    Hubbard, Tim J.
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1760 - 1774