A robust benchmark for detection of germline large deletions and insertions

被引:219
作者
Zook, Justin M. [1 ]
Hansen, Nancy F. [2 ]
Olson, Nathan D. [1 ]
Chapman, Lesley [1 ]
Mullikin, James C. [2 ]
Xiao, Chunlin [3 ]
Sherry, Stephen [3 ]
Koren, Sergey [2 ]
Phillippy, Adam M. [2 ]
Boutros, Paul C. [4 ]
Sahraeian, Sayed Mohammad E. [5 ]
Huang, Vincent [6 ]
Rouette, Alexandre [7 ]
Alexander, Noah [8 ]
Mason, Christopher E. [9 ,10 ,11 ,12 ]
Hajirasouliha, Iman [9 ]
Ricketts, Camir [9 ]
Lee, Joyce [13 ]
Tearle, Rick [14 ]
Fiddes, Ian T. [15 ]
Barrio, Alvaro Martinez [15 ]
Wala, Jeremiah [16 ]
Carroll, Andrew [17 ]
Ghaffari, Noushin [18 ]
Rodriguez, Oscar L. [19 ]
Bashir, Ali [19 ]
Jackman, Shaun [20 ]
Farrell, John J. [21 ]
Wenger, Aaron M. [22 ]
Alkan, Can [23 ]
Soylev, Arda [24 ]
Schatz, Michael C. [25 ]
Garg, Shilpa [26 ]
Church, George [26 ]
Marschall, Tobias [27 ]
Chen, Ken [28 ]
Fan, Xian [29 ]
English, Adam C. [30 ]
Rosenfeld, Jeffrey A. [31 ,32 ]
Zhou, Weichen [33 ]
Mills, Ryan E. [33 ]
Sage, Jay M. [34 ]
Davis, Jennifer R. [34 ]
Kaiser, Michael D. [34 ]
Oliver, John S. [34 ]
Catalano, Anthony P. [34 ]
Chaisson, Mark J. P. [35 ]
Spies, Noah [36 ]
Sedlazeck, Fritz J. [37 ]
Salit, Marc [36 ]
机构
[1] NIST, Mat Measurement Lab, Gaithersburg, MD 20899 USA
[2] NHGRI, NIH, Rockville, MD USA
[3] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD 20894 USA
[4] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA USA
[5] Roche Sequencing Solut, Belmont, CA USA
[6] Ontario Inst Canc Res, Toronto, ON, Canada
[7] CHU St Justine, Div Hematol Oncol, Charles Bruneau Canc Ctr, Montreal, PQ, Canada
[8] Univ Calif Los Angeles, Inst Mol Biol, Los Angeles, CA 90024 USA
[9] Weill Cornell Med, Inst Computat Biomed, Dept Physiol & Biophys, New York, NY USA
[10] Weill Cornell Med, HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsau, New York, NY USA
[11] Weill Cornell Med, WorldQuant Initiat Quantitat Predict, New York, NY USA
[12] Weill Cornell Med, Feil Family Brain & Mind Res Inst, New York, NY USA
[13] Bionano Genom Inc, San Diego, CA USA
[14] Univ Adelaide, Sch Anim & Vet Sci, Davies Res Ctr, Roseworthy, SA, Australia
[15] 10X Genom, Pleasanton, CA USA
[16] Broad Inst MIT & Harvard, Cambridge, MA 02142 USA
[17] Google, Mountain View, CA USA
[18] Prairie View A&M Univ, Roy G Perry Coll Engn, Dept Comp Sci, Prairie View, TX USA
[19] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[20] BC Canc Genome Sci Ctr, Vancouver, BC, Canada
[21] Boston Univ, Sch Med, Dept Med, Biomed Genet, Boston, MA 02118 USA
[22] Pacific Biosci, Menlo Pk, CA USA
[23] Bilkent Univ, Dept Comp Engn, Ankara, Turkey
[24] Konya Food & Agr Univ, Dept Comp Engn, Konya, Turkey
[25] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[26] Harvard Med Sch, Dept Genet, Boston, MA 02115 USA
[27] Heinrich Heine Univ, Fac Med, Dusseldorf, Germany
[28] Univ Texas MD Anderson Canc Ctr, Dept Bioinformat & Computat Biol, Houston, TX 77030 USA
[29] Rice Univ, Dept Comp Sci, Houston, TX USA
[30] Spiral Genet, Bioinformat R&D, Seattle, WA USA
[31] Rutgers Canc Inst New Jersey, New Brunswick, NJ USA
[32] Univ Med & Dent New Jersey, Dept Pathol, New Brunswick, NJ USA
[33] Univ Michigan, Sch Med, Dept Computat Med & Bioinformat, Ann Arbor, MI USA
[34] Nabsys 2 0 LLC, Providence, RI USA
[35] Univ Southern Calif, Quantitat & Computat Biol, Los Angeles, CA 90007 USA
[36] Stanford Univ, SLAC Natl Accelerator Lab, Joint Initiat Metrol Biol, Stanford, CA 94305 USA
[37] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX 77030 USA
基金
美国国家卫生研究院;
关键词
STRUCTURAL VARIATION; HUMAN GENOME; VARIANTS; RESOURCE; SNP;
D O I
10.1038/s41587-020-0538-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Detection of structural variants in the human genome is facilitated by a benchmark set of large deletions and insertions. New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls >= 50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by >= 1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.
引用
收藏
页码:1347 / +
页数:14
相关论文
共 49 条
[1]   Characterizing the Major Structural Variant Alleles of the Human Genome [J].
Audano, Peter A. ;
Sulovari, Arvis ;
Graves-Lindsay, Tina A. ;
Cantsilieris, Stuart ;
Sorensen, Melanie ;
Welch, AnneMarie E. ;
Dougherty, Max L. ;
Nelson, Bradley J. ;
Shah, Ankeeta ;
Dutcher, Susan K. ;
Warren, Wesley C. ;
Magrini, Vincent ;
McGrath, Sean D. ;
Li, Yang I. ;
Wilson, Richard K. ;
Eichler, Evan E. .
CELL, 2019, 176 (03) :663-+
[2]   A public resource facilitating clinical use of genomes [J].
Ball, Madeleine P. ;
Thakuria, Joseph V. ;
Zaranek, Alexander Wait ;
Clegg, Tom ;
Rosenbaum, Abraham M. ;
Wu, Xiaodi ;
Angrist, Misha ;
Bhak, Jong ;
Bobe, Jason ;
Callow, Matthew J. ;
Cano, Carlos ;
Chou, Michael F. ;
Chung, Wendy K. ;
Douglas, Shawn M. ;
Estep, Preston W. ;
Gore, Athurva ;
Hulick, Peter ;
Labarga, Alberto ;
Lee, Je-Hyuk ;
Lunshof, Jeantine E. ;
Kim, Byung Chul ;
Kim, Jong-Il ;
Li, Zhe ;
Murray, Michael F. ;
Nilsen, Geoffrey B. ;
Peters, Brock A. ;
Raman, Anugraha M. ;
Rienhoff, Hugh Y. ;
Robasky, Kimberly ;
Wheeler, Matthew T. ;
Vandewege, Ward ;
Vorhaus, Daniel B. ;
Yang, Joyce L. ;
Yang, Luhan ;
Aach, John ;
Ashley, Euan A. ;
Drmanac, Radoje ;
Kim, Seong-Jin ;
Li, Jin Billy ;
Peshkin, Leonid ;
Seidman, Christine E. ;
Seo, Jeong-Sun ;
Zhang, Kun ;
Rehm, Heidi L. ;
Church, George M. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (30) :11920-11927
[3]   Next-generation mapping: a novel approach for detection of pathogenic structural variants with a potential utility in clinical diagnosis [J].
Barseghyan, Hayk ;
Tang, Wilson ;
Wang, Richard T. ;
Almalvez, Miguel ;
Segura, Eva ;
Bramble, Matthew S. ;
Lipson, Allen ;
Douine, Emilie D. ;
Lee, Hane ;
Delot, Emmanuele C. ;
Nelson, Stanley F. ;
Vilain, Eric .
GENOME MEDICINE, 2017, 9
[4]   GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly [J].
Cameron, Daniel L. ;
Schroder, Jan ;
Penington, Jocelyn Sietsma ;
Do, Hongdo ;
Molania, Ramyar ;
Dobrovic, Alexander ;
Speed, Terence P. ;
Papenfuss, Anthony T. .
GENOME RESEARCH, 2017, 27 (12) :2050-2060
[5]   Multi-platform discovery of haplotype-resolved structural variation in human genomes [J].
Chaisson, Mark J. P. ;
Sanders, Ashley D. ;
Zhao, Xuefang ;
Malhotra, Ankit ;
Porubsky, David ;
Rausch, Tobias ;
Gardner, Eugene J. ;
Rodriguez, Oscar L. ;
Guo, Li ;
Collins, Ryan L. ;
Fan, Xian ;
Wen, Jia ;
Handsaker, Robert E. ;
Fairley, Susan ;
Kronenberg, Zev N. ;
Kong, Xiangmeng ;
Hormozdiari, Fereydoun ;
Lee, Dillon ;
Wenger, Aaron M. ;
Hastie, Alex R. ;
Antaki, Danny ;
Anantharaman, Thomas ;
Audano, Peter A. ;
Brand, Harrison ;
Cantsilieris, Stuart ;
Cao, Han ;
Cerveira, Eliza ;
Chen, Chong ;
Chen, Xintong ;
Chin, Chen-Shan ;
Chong, Zechen ;
Chuang, Nelson T. ;
Lambert, Christine C. ;
Church, Deanna M. ;
Clarke, Laura ;
Farrell, Andrew ;
Flores, Joey ;
Galeev, Timur ;
Gorkin, David U. ;
Gujral, Madhusudan ;
Guryev, Victor ;
Heaton, William Haynes ;
Korlach, Jonas ;
Kumar, Sushant ;
Kwon, Jee Young ;
Lam, Ernest T. ;
Lee, Jong Eun ;
Lee, Joyce ;
Lee, Wan-Ping ;
Lee, Sau Peng .
NATURE COMMUNICATIONS, 2019, 10 (1)
[6]   Resolving the complexity of the human genome using single-molecule sequencing [J].
Chaisson, Mark J. P. ;
Huddleston, John ;
Dennis, Megan Y. ;
Sudmant, Peter H. ;
Malig, Maika ;
Hormozdiari, Fereydoun ;
Antonacci, Francesca ;
Surti, Urvashi ;
Sandstrom, Richard ;
Boitano, Matthew ;
Landolin, Jane M. ;
Stamatoyannopoulos, John A. ;
Hunkapiller, Michael W. ;
Korlach, Jonas ;
Eichler, Evan E. .
NATURE, 2015, 517 (7536) :608-U163
[7]  
CHAPMAN LM, 2019, SVCURATOR CROWDSOURC
[8]   Paragraph: a graph-based structural variant genotyper for short-read sequence data [J].
Chen, Sai ;
Krusche, Peter ;
Dolzhenko, Egor ;
Sherman, Rachel M. ;
Petrovski, Roman ;
Schlesinger, Felix ;
Kirsche, Melanie ;
Bentley, David R. ;
Schatz, Michael C. ;
Sedlazeck, Fritz J. ;
Eberle, Michael A. .
GENOME BIOLOGY, 2019, 20 (01)
[9]   The impact of structural variation on human gene expression [J].
Chiang, Colby ;
Scott, Alexandra J. ;
Davis, Joe R. ;
Tsang, Emily K. ;
Li, Xin ;
Kim, Yungil ;
Hadzic, Tarik ;
Damani, Farhan N. ;
Ganel, Liron ;
Montgomery, Stephen B. ;
Battle, Alexis ;
Conrad, Donald F. ;
Hall, Ira M. .
NATURE GENETICS, 2017, 49 (05) :692-+
[10]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/nmeth.4035, 10.1038/NMETH.4035]