A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

被引:666
作者
Su, Zhenqiang [1 ]
Labaj, Pawel P. [2 ]
Li, Sheng [3 ,4 ]
Thierry-Mieg, Jean [5 ]
Thierry-Mieg, Danielle [5 ]
Shi, Wei [6 ,7 ]
Wang, Charles [8 ,9 ]
Schroth, Gary P. [10 ]
Setterquist, Robert A. [11 ]
Thompson, John F. [12 ]
Jones, Wendell D. [13 ]
Xiao, Wenzhong [14 ,15 ]
Xu, Weihong [15 ]
Jensen, Roderick V. [16 ]
Kelly, Reagan [1 ]
Xu, Joshua [1 ]
Conesa, Ana [17 ]
Furlanello, Cesare [18 ]
Gao, Hanlin [19 ]
Hong, Huixiao [1 ]
Jafari, Nadereh [20 ]
Letovsky, Stan [21 ]
Liao, Yang [6 ,22 ]
Lu, Fei [23 ]
Oakeley, Edward J. [24 ]
Peng, Zhiyu [25 ]
Praul, Craig A. [26 ]
Santoyo-Lopez, Javier [27 ,28 ]
Scherer, Andreas [29 ,30 ]
Shi, Tieliu [31 ,32 ]
Smyth, Gordon K. [6 ,33 ]
Staedtler, Frank [24 ]
Sykacek, Peter [2 ]
Tan, Xin-Xing [23 ]
Thompson, E. Aubrey [34 ]
Vandesompele, Jo [35 ]
Wang, May D. [36 ,37 ]
Wang, Jian [38 ]
Wolfinger, Russell D. [39 ]
Zavadil, Jiri [40 ,41 ,42 ]
Auerbach, Scott S. [43 ]
Bao, Wenjun [39 ]
Binder, Hans [44 ]
Blomquist, Thomas [45 ]
Brilliant, Murray H. [46 ]
Bushel, Pierre R. [43 ]
Cain, Weimin [47 ]
Catalano, Jennifer G. [48 ]
Chang, Ching-Wei [1 ]
Chen, Tao [1 ]
机构
[1] US FDA, Natl Ctr Toxicol Res, Jefferson, AR 72079 USA
[2] Boku Univ Vienna, Chair Bioinformat Res Grp, Vienna, Austria
[3] Weill Cornell Med Coll, Dept Physiol & Biophys, New York, NY USA
[4] Weill Cornell Med Coll, HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsau, New York, NY USA
[5] NCBI, NIH, Bethesda, MD USA
[6] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Parkville, Vic, Australia
[7] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic 3052, Australia
[8] Loma Linda Univ, Ctr Genom, Sch Med, Loma Linda, CA 92350 USA
[9] Loma Linda Univ, Div Microbiol & Mol Genet, Sch Med, Loma Linda, CA 92350 USA
[10] Illumina Inc, Hayward, CA USA
[11] Life Technol Corp, Austin, TX USA
[12] Claritas Genom, Cambridge, MA USA
[13] Express Anal Inc, Durham, NC USA
[14] Harvard Univ, Massachusetts Gen Hosp, Sch Med, Boston, MA USA
[15] Stanford Genome Technol Ctr, Palo Alto, CA USA
[16] Virginia Tech, Dept Biol Sci, Blacksburg, VA USA
[17] Ctr Invest Principe Felipe, Computat Genom Program, Valencia, Spain
[18] Fdn Bruno Kessler, Trento, Trento, Italy
[19] City Hope Natl Med Ctr, DNA Sequencing Solexa Core, Beckman Res Inst, City Hope Comprehens Canc Ctr, Duarte, CA 91010 USA
[20] Northwestern Univ, Ctr Genet Med, Feinberg Sch Med, Chicago, IL 60611 USA
[21] SynapDx Corp, Lexington, MA USA
[22] Univ Melbourne, Dept Med Biol, Parkville, Vic 3052, Australia
[23] GE Healthcare SeqWright Genom Serv, Houston, TX USA
[24] Novartis Inst Biomed Res, Basel, Switzerland
[25] BGI Shenzhen, Bei Shan Ind Zone, Shenzhen, Guangdong, Peoples R China
[26] Penn State Univ, University Pk, PA 16802 USA
[27] Genom & Bioinformat Platform Andalusia, Med Genome Project, Seville, Spain
[28] Univ Edinburgh, Edinburgh Genom, Edinburgh, Midlothian, Scotland
[29] Australian Genome Res Facil Ltd, Walter & Eliza Hall Inst Med Res, Parkville, Vic, Australia
[30] Spheromics, Kontiolahti, Finland
[31] E China Normal Univ, Inst Biomed Sci, Ctr Bioinformat & Computat Biol, Shanghai Key Lab Regulatory Biol, Shanghai 200062, Peoples R China
[32] E China Normal Univ, Sch Life Sci, Shanghai 200062, Peoples R China
[33] Univ Melbourne, Dept Math & Stat, Parkville, Vic 3052, Australia
[34] Mayo Clin Jacksonville, Dept Canc Biol, Jacksonville, FL 32224 USA
[35] Biogazelle, Zwijnaarde, Belgium
[36] GeorgiaTech, Dept Biomed Engn, Atlanta, GA USA
[37] Emory Univ, Atlanta, GA 30322 USA
[38] Eli Lilly & Co, Lilly Corp Ctr, Res Informat, Indianapolis, IN 46285 USA
[39] SAS Inst Inc, Cary, NC USA
[40] NYU, Langone Med Ctr, NYU Genome Technol Ctr, New York, NY USA
[41] NYU, Langone Med Ctr, NYU Ctr Hlth Informat & Bioinformat, Dept Pathol, New York, NY USA
[42] Int Agcy Res Canc, Mol Mech & Biomarkers Grp, F-69372 Lyon, France
[43] NIEHS, NIH, Res Triangle Pk, NC 27709 USA
[44] Univ Leipzig, Interdisciplinary Ctr Bioinformat, D-04109 Leipzig, Germany
[45] Univ Toledo, Div Pulm & Crit Care Med, Dept Med, Med Coll Ohio, Toledo, OH 43606 USA
[46] Marshfield Clin Res Fdn, Ctr Human Genet, Marshfield, WI USA
[47] Fudan Univ, Sch Pharm, Shanghai 200433, Peoples R China
[48] US FDA, Off Cellular Tissue & Gene Therapies, CBER, Bethesda, MD 20014 USA
[49] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, Icahn Inst Genom & Multiscale Biol, New York, NY 10029 USA
[50] Johannes Kepler Univ Linz, Inst Bioinformat, A-4040 Linz, Austria
基金
中国国家自然科学基金; 英国生物技术与生命科学研究理事会; 英国医学研究理事会; 美国国家卫生研究院;
关键词
DIFFERENTIAL GENE-EXPRESSION; GENOME ANNOTATION; TRANSCRIPTOME; ARRAYS; BIAS; PCR;
D O I
10.1038/nbt.2957
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific-filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
引用
收藏
页码:903 / 914
页数:12
相关论文
共 46 条
[1]   Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays [J].
Agarwal, Ashish ;
Koppstein, David ;
Rozowsky, Joel ;
Sboner, Andrea ;
Habegger, Lukas ;
Hillier, LaDeana W. ;
Sasidharan, Rajkumar ;
Reinke, Valerie ;
Waterston, Robert H. ;
Gerstein, Mark .
BMC GENOMICS, 2010, 11
[2]   Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries [J].
Aird, Daniel ;
Ross, Michael G. ;
Chen, Wei-Sheng ;
Danielsson, Maxwell ;
Fennell, Timothy ;
Russ, Carsten ;
Jaffe, David B. ;
Nusbaum, Chad ;
Gnirke, Andreas .
GENOME BIOLOGY, 2011, 12 (02)
[3]   The external RNA controls consortium: a progress report [J].
Baker, SC ;
Bauer, SR ;
Beyer, RP ;
Brenton, JD ;
Bromley, B ;
Burrill, J ;
Causton, H ;
Conley, MP ;
Elespuru, R ;
Fero, M ;
Foy, C ;
Fuscoe, J ;
Gao, XL ;
Gerhold, DL ;
Gilles, P ;
Goodsaid, F ;
Guo, X ;
Hackett, J ;
Hockett, RD ;
Ikonomi, P ;
Irizarry, RA ;
Kawasaki, ES ;
Kaysser-Kranich, T ;
Kerr, K ;
Kiser, G ;
Koch, WH ;
Lee, KY ;
Liu, CM ;
Liu, ZL ;
Lucas, A ;
Manohar, CF ;
Miyada, G ;
Modrusan, Z ;
Parkes, H ;
Puri, RK ;
Reid, L ;
Ryder, TB ;
Salit, M ;
Samaha, RR ;
Scherf, U ;
Sendera, TJ ;
Setterquist, RA ;
Shi, LM ;
Shippy, R ;
Soriano, JV ;
Wagar, EA ;
Warrington, JA ;
Williams, M ;
Wilmer, F ;
Wilson, M .
NATURE METHODS, 2005, 2 (10) :731-734
[4]   Summarizing and correcting the GC content bias in high-throughput sequencing [J].
Benjamini, Yuval ;
Speed, Terence P. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (10) :e72
[5]   Comprehensive genomic characterization defines human glioblastoma genes and core pathways [J].
Chin, L. ;
Meyerson, M. ;
Aldape, K. ;
Bigner, D. ;
Mikkelsen, T. ;
VandenBerg, S. ;
Kahn, A. ;
Penny, R. ;
Ferguson, M. L. ;
Gerhard, D. S. ;
Getz, G. ;
Brennan, C. ;
Taylor, B. S. ;
Winckler, W. ;
Park, P. ;
Ladanyi, M. ;
Hoadley, K. A. ;
Verhaak, R. G. W. ;
Hayes, D. N. ;
Spellman, Paul T. ;
Absher, D. ;
Weir, B. A. ;
Ding, L. ;
Wheeler, D. ;
Lawrence, M. S. ;
Cibulskis, K. ;
Mardis, E. ;
Zhang, Jinghui ;
Wilson, R. K. ;
Donehower, L. ;
Wheeler, D. A. ;
Purdom, E. ;
Wallis, J. ;
Laird, P. W. ;
Herman, J. G. ;
Schuebel, K. E. ;
Weisenberger, D. J. ;
Baylin, S. B. ;
Schultz, N. ;
Yao, Jun ;
Wiedemeyer, R. ;
Weinstein, J. ;
Sander, C. ;
Gibbs, R. A. ;
Gray, J. ;
Kucherlapati, R. ;
Lander, E. S. ;
Myers, R. M. ;
Perou, C. M. ;
McLendon, Roger .
NATURE, 2008, 455 (7216) :1061-1068
[6]   Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data [J].
Dai, MH ;
Wang, PL ;
Boyd, AD ;
Kostov, G ;
Athey, B ;
Jones, EG ;
Bunney, WE ;
Myers, RM ;
Speed, TP ;
Akil, H ;
Watson, SJ ;
Meng, F .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :e175.1-e175.9
[7]   Landscape of transcription in human cells [J].
Djebali, Sarah ;
Davis, Carrie A. ;
Merkel, Angelika ;
Dobin, Alex ;
Lassmann, Timo ;
Mortazavi, Ali ;
Tanzer, Andrea ;
Lagarde, Julien ;
Lin, Wei ;
Schlesinger, Felix ;
Xue, Chenghai ;
Marinov, Georgi K. ;
Khatun, Jainab ;
Williams, Brian A. ;
Zaleski, Chris ;
Rozowsky, Joel ;
Roeder, Maik ;
Kokocinski, Felix ;
Abdelhamid, Rehab F. ;
Alioto, Tyler ;
Antoshechkin, Igor ;
Baer, Michael T. ;
Bar, Nadav S. ;
Batut, Philippe ;
Bell, Kimberly ;
Bell, Ian ;
Chakrabortty, Sudipto ;
Chen, Xian ;
Chrast, Jacqueline ;
Curado, Joao ;
Derrien, Thomas ;
Drenkow, Jorg ;
Dumais, Erica ;
Dumais, Jacqueline ;
Duttagupta, Radha ;
Falconnet, Emilie ;
Fastuca, Meagan ;
Fejes-Toth, Kata ;
Ferreira, Pedro ;
Foissac, Sylvain ;
Fullwood, Melissa J. ;
Gao, Hui ;
Gonzalez, David ;
Gordon, Assaf ;
Gunawardena, Harsha ;
Howald, Cedric ;
Jha, Sonali ;
Johnson, Rory ;
Kapranov, Philipp ;
King, Brandon .
NATURE, 2012, 489 (7414) :101-108
[8]   STAR: ultrafast universal RNA-seq aligner [J].
Dobin, Alexander ;
Davis, Carrie A. ;
Schlesinger, Felix ;
Drenkow, Jorg ;
Zaleski, Chris ;
Jha, Sonali ;
Batut, Philippe ;
Chaisson, Mark ;
Gingeras, Thomas R. .
BIOINFORMATICS, 2013, 29 (01) :15-21
[9]   G-stack modulated probe intensities on expression arrays - sequence corrections and signal calibration [J].
Fasold, Mario ;
Stadler, Peter F. ;
Binder, Hans .
BMC BIOINFORMATICS, 2010, 11
[10]   GENCODE: The reference human genome annotation for The ENCODE Project [J].
Harrow, Jennifer ;
Frankish, Adam ;
Gonzalez, Jose M. ;
Tapanari, Electra ;
Diekhans, Mark ;
Kokocinski, Felix ;
Aken, Bronwen L. ;
Barrell, Daniel ;
Zadissa, Amonida ;
Searle, Stephen ;
Barnes, If ;
Bignell, Alexandra ;
Boychenko, Veronika ;
Hunt, Toby ;
Kay, Mike ;
Mukherjee, Gaurab ;
Rajan, Jeena ;
Despacio-Reyes, Gloria ;
Saunders, Gary ;
Steward, Charles ;
Harte, Rachel ;
Lin, Michael ;
Howald, Cedric ;
Tanzer, Andrea ;
Derrien, Thomas ;
Chrast, Jacqueline ;
Walters, Nathalie ;
Balasubramanian, Suganthi ;
Pei, Baikang ;
Tress, Michael ;
Manuel Rodriguez, Jose ;
Ezkurdia, Iakes ;
van Baren, Jeltje ;
Brent, Michael ;
Haussler, David ;
Kellis, Manolis ;
Valencia, Alfonso ;
Reymond, Alexandre ;
Gerstein, Mark ;
Guigo, Roderic ;
Hubbard, Tim J. .
GENOME RESEARCH, 2012, 22 (09) :1760-1774