Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls

被引:523
作者
Zook, Justin M. [1 ]
Chapman, Brad [2 ]
Wang, Jason [3 ]
Mittelman, David [3 ,4 ,5 ]
Hofmann, Oliver [2 ]
Hide, Winston [2 ]
Salit, Marc [1 ]
机构
[1] NIST, Biosyst & Biomat Div, Gaithersburg, MD 20899 USA
[2] Harvard Univ, Sch Publ Hlth, Dept Biostat, Bioinformat Core, Cambridge, MA 02138 USA
[3] Arpeggi Inc, Austin, TX USA
[4] Virginia Bioinformat Inst, Blacksburg, VA USA
[5] Dept Biol Sci, Blacksburg, VA USA
关键词
MUTATIONS; FRAMEWORK; EXOME;
D O I
10.1038/nbt.2835
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.
引用
收藏
页码:246 / 251
页数:6
相关论文
共 25 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], PLASMA PROTEINS STRU
[3]  
[Anonymous], 2012, Nature
[4]   Sequence analysis of mutations and translocations across breast cancer subtypes [J].
Banerji, Shantanu ;
Cibulskis, Kristian ;
Rangel-Escareno, Claudia ;
Brown, Kristin K. ;
Carter, Scott L. ;
Frederick, Abbie M. ;
Lawrence, Michael S. ;
Sivachenko, Andrey Y. ;
Sougnez, Carrie ;
Zou, Lihua ;
Cortes, Maria L. ;
Fernandez-Lopez, Juan C. ;
Peng, Shouyong ;
Ardlie, Kristin G. ;
Auclair, Daniel ;
Bautista-Pina, Veronica ;
Duke, Fujiko ;
Francis, Joshua ;
Jung, Joonil ;
Maffuz-Aziz, Antonio ;
Onofrio, Robert C. ;
Parkin, Melissa ;
Pho, Nam H. ;
Quintanar-Jurado, Valeria ;
Ramos, Alex H. ;
Rebollar-Vega, Rosa ;
Rodriguez-Cuevas, Sergio ;
Romero-Cordoba, Sandra L. ;
Schumacher, Steven E. ;
Stransky, Nicolas ;
Thompson, Kristin M. ;
Uribe-Figueroa, Laura ;
Baselga, Jose ;
Beroukhim, Rameen ;
Polyak, Kornelia ;
Sgroi, Dennis C. ;
Richardson, Andrea L. ;
Jimenez-Sanchez, Gerardo ;
Lander, Eric S. ;
Gabriel, Stacey B. ;
Garraway, Levi A. ;
Golub, Todd R. ;
Melendez-Zajgla, Jorge ;
Toker, Alex ;
Getz, Gad ;
Hidalgo-Miranda, Alfredo ;
Meyerson, Matthew .
NATURE, 2012, 486 (7403) :405-409
[5]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[6]   The new sequencer on the block: comparison of Life Technology's Proton sequencer to an Illumina HiSeq for whole-exome sequencing [J].
Boland, Joseph F. ;
Chung, Charles C. ;
Roberson, David ;
Mitchell, Jason ;
Zhang, Xijun ;
Im, Kate M. ;
He, Ji ;
Chanock, Stephen J. ;
Yeager, Meredith ;
Dean, Michael .
HUMAN GENETICS, 2013, 132 (10) :1153-1163
[7]   First FDA Authorization for Next-Generation Sequencer [J].
Collins, Francis S. ;
Hamburg, Margaret A. .
NEW ENGLAND JOURNAL OF MEDICINE, 2013, 369 (25) :2369-2371
[8]   A framework for variation discovery and genotyping using next-generation DNA sequencing data [J].
DePristo, Mark A. ;
Banks, Eric ;
Poplin, Ryan ;
Garimella, Kiran V. ;
Maguire, Jared R. ;
Hartl, Christopher ;
Philippakis, Anthony A. ;
del Angel, Guillermo ;
Rivas, Manuel A. ;
Hanna, Matt ;
McKenna, Aaron ;
Fennell, Tim J. ;
Kernytsky, Andrew M. ;
Sivachenko, Andrey Y. ;
Cibulskis, Kristian ;
Gabriel, Stacey B. ;
Altshuler, David ;
Daly, Mark J. .
NATURE GENETICS, 2011, 43 (05) :491-+
[9]  
Garrison E., 2012, PREPRINT
[10]   Dissecting the genomic complexity underlying medulloblastoma [J].
Jones, David T. W. ;
Jaeger, Natalie ;
Kool, Marcel ;
Zichner, Thomas ;
Hutter, Barbara ;
Sultan, Marc ;
Cho, Yoon-Jae ;
Pugh, Trevor J. ;
Hovestadt, Volker ;
Stuetz, Adrian M. ;
Rausch, Tobias ;
Warnatz, Hans-Joerg ;
Ryzhova, Marina ;
Bender, Sebastian ;
Sturm, Dominik ;
Pleier, Sabrina ;
Cin, Huriye ;
Pfaff, Elke ;
Sieber, Laura ;
Wittmann, Andrea ;
Remke, Marc ;
Witt, Hendrik ;
Hutter, Sonja ;
Tzaridis, Theophilos ;
Weischenfeldt, Joachim ;
Raeder, Benjamin ;
Avci, Meryem ;
Amstislavskiy, Vyacheslav ;
Zapatka, Marc ;
Weber, Ursula D. ;
Wang, Qi ;
Lasitschka, Baerbel ;
Bartholomae, Cynthia C. ;
Schmidt, Manfred ;
von Kalle, Christof ;
Ast, Volker ;
Lawerenz, Chris ;
Eils, Juergen ;
Kabbe, Rolf ;
Benes, Vladimir ;
van Sluis, Peter ;
Koster, Jan ;
Volckmann, Richard ;
Shih, David ;
Betts, Matthew J. ;
Russell, Robert B. ;
Coco, Simona ;
Tonini, Gian Paolo ;
Schueller, Ulrich ;
Hans, Volkmar .
NATURE, 2012, 488 (7409) :100-105