A robust benchmark for detection of germline large deletions and insertions

被引:219
作者
Zook, Justin M. [1 ]
Hansen, Nancy F. [2 ]
Olson, Nathan D. [1 ]
Chapman, Lesley [1 ]
Mullikin, James C. [2 ]
Xiao, Chunlin [3 ]
Sherry, Stephen [3 ]
Koren, Sergey [2 ]
Phillippy, Adam M. [2 ]
Boutros, Paul C. [4 ]
Sahraeian, Sayed Mohammad E. [5 ]
Huang, Vincent [6 ]
Rouette, Alexandre [7 ]
Alexander, Noah [8 ]
Mason, Christopher E. [9 ,10 ,11 ,12 ]
Hajirasouliha, Iman [9 ]
Ricketts, Camir [9 ]
Lee, Joyce [13 ]
Tearle, Rick [14 ]
Fiddes, Ian T. [15 ]
Barrio, Alvaro Martinez [15 ]
Wala, Jeremiah [16 ]
Carroll, Andrew [17 ]
Ghaffari, Noushin [18 ]
Rodriguez, Oscar L. [19 ]
Bashir, Ali [19 ]
Jackman, Shaun [20 ]
Farrell, John J. [21 ]
Wenger, Aaron M. [22 ]
Alkan, Can [23 ]
Soylev, Arda [24 ]
Schatz, Michael C. [25 ]
Garg, Shilpa [26 ]
Church, George [26 ]
Marschall, Tobias [27 ]
Chen, Ken [28 ]
Fan, Xian [29 ]
English, Adam C. [30 ]
Rosenfeld, Jeffrey A. [31 ,32 ]
Zhou, Weichen [33 ]
Mills, Ryan E. [33 ]
Sage, Jay M. [34 ]
Davis, Jennifer R. [34 ]
Kaiser, Michael D. [34 ]
Oliver, John S. [34 ]
Catalano, Anthony P. [34 ]
Chaisson, Mark J. P. [35 ]
Spies, Noah [36 ]
Sedlazeck, Fritz J. [37 ]
Salit, Marc [36 ]
机构
[1] NIST, Mat Measurement Lab, Gaithersburg, MD 20899 USA
[2] NHGRI, NIH, Rockville, MD USA
[3] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[4] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA USA
[5] Roche Sequencing Solut, Belmont, CA USA
[6] Ontario Inst Canc Res, Toronto, ON, Canada
[7] CHU St Justine, Div Hematol Oncol, Charles Bruneau Canc Ctr, Montreal, PQ, Canada
[8] Univ Calif Los Angeles, Inst Mol Biol, Los Angeles, CA 90024 USA
[9] Weill Cornell Med, Inst Computat Biomed, Dept Physiol & Biophys, New York, NY USA
[10] Weill Cornell Med, HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsau, New York, NY USA
[11] Weill Cornell Med, WorldQuant Initiat Quantitat Predict, New York, NY USA
[12] Weill Cornell Med, Feil Family Brain & Mind Res Inst, New York, NY USA
[13] Bionano Genom Inc, San Diego, CA USA
[14] Univ Adelaide, Sch Anim & Vet Sci, Davies Res Ctr, Roseworthy, SA, Australia
[15] 10X Genom, Pleasanton, CA USA
[16] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[17] Google, Mountain View, CA USA
[18] Prairie View A&M Univ, Roy G Perry Coll Engn, Dept Comp Sci, Prairie View, TX USA
[19] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[20] BC Canc Genome Sci Ctr, Vancouver, BC, Canada
[21] Boston Univ, Sch Med, Dept Med, Biomed Genet, Boston, MA 02118 USA
[22] Pacific Biosci, Menlo Pk, CA USA
[23] Bilkent Univ, Dept Comp Engn, Ankara, Turkey
[24] Konya Food & Agr Univ, Dept Comp Engn, Konya, Turkey
[25] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[26] Harvard Med Sch, Dept Genet, Boston, MA 02115 USA
[27] Heinrich Heine Univ, Fac Med, Dusseldorf, Germany
[28] Univ Texas MD Anderson Canc Ctr, Dept Bioinformat & Computat Biol, Houston, TX 77030 USA
[29] Rice Univ, Dept Comp Sci, Houston, TX USA
[30] Spiral Genet, Bioinformat R&D, Seattle, WA USA
[31] Rutgers Canc Inst New Jersey, New Brunswick, NJ USA
[32] Univ Med & Dent New Jersey, Dept Pathol, New Brunswick, NJ USA
[33] Univ Michigan, Sch Med, Dept Computat Med & Bioinformat, Ann Arbor, MI USA
[34] Nabsys 2 0 LLC, Providence, RI USA
[35] Univ Southern Calif, Quantitat & Computat Biol, Los Angeles, CA 90007 USA
[36] Stanford Univ, SLAC Natl Accelerator Lab, Joint Initiat Metrol Biol, Stanford, CA 94305 USA
[37] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
基金
美国国家卫生研究院;
关键词
STRUCTURAL VARIATION; HUMAN GENOME; VARIANTS; RESOURCE; SNP;
D O I
10.1038/s41587-020-0538-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Detection of structural variants in the human genome is facilitated by a benchmark set of large deletions and insertions. New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls >= 50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by >= 1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.
引用
收藏
页码:1347 / +
页数:14
相关论文
共 49 条
[41]   Mapping and phasing of structural variation in patient genomes using nanopore sequencing [J].
Stancu, Mircea Cretu ;
van Roosmalen, Markus J. ;
Renkens, Ivo ;
Nieboer, Marleen M. ;
Middelkamp, Sjors ;
de Ligt, Joep ;
Pregno, Giulia ;
Giachino, Daniela ;
Mandrile, Giorgia ;
Valle-Inclan, Jose Espejo ;
Korzelius, Jerome ;
de Bruijn, Ewart ;
Cuppen, Edwin ;
Talkowski, Michael E. ;
Marschall, Tobias ;
de Ridder, Jeroen ;
Kloosterman, Wigard P. .
NATURE COMMUNICATIONS, 2017, 8
[42]   An integrated map of structural variation in 2,504 human genomes [J].
Sudmant, Peter H. ;
Rausch, Tobias ;
Gardner, Eugene J. ;
Handsaker, Robert E. ;
Abyzov, Alexej ;
Huddleston, John ;
Zhang, Yan ;
Ye, Kai ;
Jun, Goo ;
Fritz, Markus Hsi-Yang ;
Konkel, Miriam K. ;
Malhotra, Ankit ;
Stuetz, Adrian M. ;
Shi, Xinghua ;
Casale, Francesco Paolo ;
Chen, Jieming ;
Hormozdiari, Fereydoun ;
Dayama, Gargi ;
Chen, Ken ;
Malig, Maika ;
Chaisson, Mark J. P. ;
Walter, Klaudia ;
Meiers, Sascha ;
Kashin, Seva ;
Garrison, Erik ;
Auton, Adam ;
Lam, Hugo Y. K. ;
Mu, Xinmeng Jasmine ;
Alkan, Can ;
Antaki, Danny ;
Bae, Taejeong ;
Cerveira, Eliza ;
Chines, Peter ;
Chong, Zechen ;
Clarke, Laura ;
Dal, Elif ;
Ding, Li ;
Emery, Sarah ;
Fan, Xian ;
Gujral, Madhusudan ;
Kahveci, Fatma ;
Kidd, Jeffrey M. ;
Kong, Yu ;
Lameijer, Eric-Wubbo ;
McCarthy, Shane ;
Flicek, Paul ;
Gibbs, Richard A. ;
Marth, Gabor ;
Mason, Christopher E. ;
Menelaou, Androniki .
NATURE, 2015, 526 (7571) :75-+
[43]   SvABA: genome-wide detection of structural variants and indels by local assembly [J].
Wala, Jeremiah A. ;
Bandopadhayay, Pratiti ;
Greenwald, Noah F. ;
O'Rourke, Ryan ;
Sharpe, Ted ;
Stewart, Chip ;
Schumacher, Steve ;
Li, Yilong ;
Weischenfeldt, Joachim ;
Yao, Xiaotong ;
Nusbaum, Chad ;
Campbell, Peter ;
Getz, Gad ;
Meyerson, Matthew ;
Zhang, Cheng-Zhong ;
Imielinski, Marcin ;
Beroukhim, Rameen .
GENOME RESEARCH, 2018, 28 (04) :581-591
[44]   Direct determination of diploid genome sequences [J].
Weisenfeld, Neil I. ;
Kumar, Vijay ;
Shah, Preyas ;
Church, Deanna M. ;
Jaffe, David B. .
GENOME RESEARCH, 2017, 27 (05) :757-767
[45]   Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome [J].
Wenger, Aaron M. ;
Peluso, Paul ;
Rowell, William J. ;
Chang, Pi-Chuan ;
Hall, Richard J. ;
Concepcion, Gregory T. ;
Ebler, Jana ;
Fungtammasan, Arkarachai ;
Kolesnikov, Alexey ;
Olson, Nathan D. ;
Topfer, Armin ;
Alonge, Michael ;
Mahmoud, Medhat ;
Qian, Yufeng ;
Chin, Chen-Shan ;
Phillippy, Adam M. ;
Schate, Michael C. ;
Myers, Gene ;
DePristo, Mark A. ;
Ruan, Jue ;
Marschall, Tobias ;
Sedlazeck, Fritz J. ;
Zook, Justin M. ;
Li, Heng ;
Koren, Sergey ;
Carroll, Andrew ;
Rank, David R. ;
Hunkapiller, Michael W. .
NATURE BIOTECHNOLOGY, 2019, 37 (10) :1155-+
[46]   SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution [J].
Xia, Li Charlie ;
Ai, Dongmei ;
Lee, Hojoon ;
Andor, Noemi ;
Li, Chao ;
Zhang, Nancy R. ;
Ji, Hanlee P. .
GIGASCIENCE, 2018, 7 (07)
[47]   An open resource for accurately benchmarking small variant and reference calls [J].
Zook, Justin M. ;
McDaniel, Jennifer ;
Olson, Nathan D. ;
Wagner, Justin ;
Parikh, Hemang ;
Heaton, Haynes ;
Irvine, Sean A. ;
Trigg, Len ;
Truty, Rebecca ;
McLean, Cory, V ;
De La Vega, Francisco M. ;
Xiao, Chunlin ;
Sherry, Stephen ;
Salit, Marc .
NATURE BIOTECHNOLOGY, 2019, 37 (05) :561-+
[48]   Extensive sequencing of seven human genomes to characterize benchmark reference materials [J].
Zook, Justin M. ;
Catoe, David ;
McDaniel, Jennifer ;
Vang, Lindsay ;
Spies, Noah ;
Sidow, Arend ;
Weng, Ziming ;
Liu, Yuling ;
Mason, Christopher E. ;
Alexander, Noah ;
Henaff, Elizabeth ;
McIntyre, Alexa B. R. ;
Chandramohan, Dhruva ;
Chen, Feng ;
Jaeger, Erich ;
Moshrefi, Ali ;
Khoa Pham ;
Stedman, William ;
Liang, Tiffany ;
Saghbini, Michael ;
Dzakula, Zeljko ;
Hastie, Alex ;
Cao, Han ;
Deikus, Gintaras ;
Schadt, Eric ;
Sebra, Robert ;
Bashir, Ali ;
Truty, Rebecca M. ;
Chang, Christopher C. ;
Gulbahce, Natali ;
Zhao, Keyan ;
Ghosh, Srinka ;
Hyland, Fiona ;
Fu, Yutao ;
Chaisson, Mark ;
Xiao, Chunlin ;
Trow, Jonathan ;
Sherry, Stephen T. ;
Zaranek, Alexander W. ;
Ball, Madeleine ;
Bobe, Jason ;
Estep, Preston ;
Church, George M. ;
Marks, Patrick ;
Kyriazopoulou-Panagiotopoulou, Sofia ;
Zheng, Grace X. Y. ;
Schnall-Levin, Michael ;
Ordonez, Heather S. ;
Mudivarti, Patrice A. ;
Giorda, Kristina .
SCIENTIFIC DATA, 2016, 3
[49]   Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls [J].
Zook, Justin M. ;
Chapman, Brad ;
Wang, Jason ;
Mittelman, David ;
Hofmann, Oliver ;
Hide, Winston ;
Salit, Marc .
NATURE BIOTECHNOLOGY, 2014, 32 (03) :246-251