A crowdsourced set of curated structural variants for the human genome

被引:4
作者
Chapman, Lesley M. [1 ,31 ]
Spies, Noah [1 ,2 ,3 ,4 ,32 ]
Pai, Patrick [5 ]
Lim, Chun Shen [6 ]
Carroll, Andrew [7 ]
Narzisi, Giuseppe [8 ]
Watson, Christopher M. [9 ,10 ]
Proukakis, Christos [11 ]
Clarke, Wayne E. [8 ]
Nariai, Naoki [12 ]
Dawson, Eric [13 ,14 ]
Jones, Garan [15 ]
Blankenberg, Daniel [16 ]
Brueffer, Christian [17 ]
Xiao, Chunlin [18 ]
Kolora, Rohit Raj [19 ,20 ,21 ,22 ]
Alexander, Noah [23 ]
Wolujewicz, Paul [24 ]
Ahmed, Azza E. [25 ,26 ]
Smith, Graeme [27 ,28 ]
Shehreen, Saadlee [29 ]
Wenger, Aaron M. [30 ]
Salit, Marc [1 ,2 ]
Zook, Justin M. [1 ]
机构
[1] NIST, Mat Measurement Lab, Biosyst & Biomat Div, Gaithersburg, MD 20899 USA
[2] Stanford Univ, Joint Initiat Metrol Biol, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Pathol, Stanford, CA 94305 USA
[5] Univ Maryland, College Pk, MD 20742 USA
[6] Univ Otago, Sch Biomed Sci, Dept Biochem, Dunedin, New Zealand
[7] DNAnexus Inc, Mountain View, CA USA
[8] New York Genome Ctr, New York, NY USA
[9] Univ Leeds, St Jamess Univ Hosp, Sch Med, Leeds, W Yorkshire, England
[10] Leeds Teaching Hosp NHS Trust, St Jamess Univ Hosp, Yorkshire Reg Genet Serv, Leeds, W Yorkshire, England
[11] UCL, Inst Neurol, London, England
[12] Illumina Inc, San Diego, CA USA
[13] NCI, Div Canc Epidemiol & Genet, NIH, Rockville, MD USA
[14] Univ Cambridge, Dept Genet, Cambridge, England
[15] Univ Exeter, Sch Med, Epidemiol & Publ Hlth Grp, Barrack Rd, Exeter, Devon, England
[16] Cleveland Clin, Lerner Res Inst, Genom Med Inst, Cleveland, OH 44106 USA
[17] Lund Univ, Dept Clin Sci Lund, Div Oncol & Pathol, Lund, Sweden
[18] NIH, Natl Ctr Biotechnol Informat, Natl Lib Med, Bldg 10, Bethesda, MD 20892 USA
[19] German Ctr Integrat Biodivers Res iDiv, Leipzig, Germany
[20] Univ Leipzig, Dept Comp Sci, Bioinformat Grp, Leipzig, Germany
[21] Univ Leipzig, Interdisciplinary Ctr Bioinformat, Leipzig, Germany
[22] Univ Leipzig, Inst Biol, Mol Evolut & Systemat Anim, Leipzig, Germany
[23] Univ Calif Los Angeles, Inst Mol Biol, Los Angeles, CA 90024 USA
[24] Weill Cornell, Belfer Res Bldg, New York, NY USA
[25] Univ Khartoum, Fac Sci, Ctr Bioinformat & Syst Biol, Khartoum, Sudan
[26] Univ Khartoum, Fac Engn, Dept Elect & Elect Engn, Khartoum, Sudan
[27] Guys Hosp, London, England
[28] St Thomass NHS Fdn Trust, London, England
[29] Univ Dhaka, Dept Genet Engn & Biotechnol, Dhaka, Bangladesh
[30] Pacific Biosci, Menlo Pk, CA USA
[31] NCI, Bethesda, MD 20892 USA
[32] Celsius Therapeut, Cambridge, MA USA
基金
美国国家卫生研究院;
关键词
Benchmarking; -; Crowdsourcing; Genes;
D O I
10.1371/journal.pcbi.1007933
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A high quality benchmark for small variants encompassing 88 to 90% of the reference genome has been developed for seven Genome in a Bottle (GIAB) reference samples. However a reliable benchmark for large indels and structural variants (SVs) is more challenging. In this study, we manually curated 1235 SVs, which can ultimately be used to evaluate SV callers or train machine learning models. We developed a crowdsourcing app-SVCurator-to help GIAB curators manually review large indels and SVs within the human genome, and report their genotype and size accuracy. SVCurator displays images from short, long, and linked read sequencing data from the GIAB Ashkenazi Jewish Trio son [NIST RM 8391/HG002]. We asked curators to assign labels describing SV type (deletion or insertion), size accuracy, and genotype for 1235 putative insertions and deletions sampled from different size bins between 20 and 892,149 bp. 'Expert' curators were 93% concordant with each other, and 37 of the 61 curators had at least 78% concordance with a set of 'expert' curators. The curators were least concordant for complex SVs and SVs that had inaccurate breakpoints or size predictions. After filtering events with low concordance among curators, we produced high confidence labels for 935 events. The SVCurator crowdsourced labels were 94.5% concordant with the heuristic-based draft benchmark SV callset from GIAB. We found that curators can successfully evaluate putative SVs when given evidence from multiple sequencing technologies.
引用
收藏
页数:20
相关论文
共 9 条
[1]   SV-plaudit: A cloud-based framework for manually curating thousands of structural variants [J].
Belyeu, Jonathan R. ;
Nicholas, Thomas J. ;
Pedersen, Brent S. ;
Sasani, Thomas A. ;
Havrilla, James M. ;
Kravitz, Stephanie N. ;
Conway, Megan E. ;
Lohman, Brian K. ;
Quinlan, Aaron R. ;
Layer, Ryan M. .
GIGASCIENCE, 2018, 7 (07)
[2]   Multi-platform discovery of haplotype-resolved structural variation in human genomes [J].
Chaisson, Mark J. P. ;
Sanders, Ashley D. ;
Zhao, Xuefang ;
Malhotra, Ankit ;
Porubsky, David ;
Rausch, Tobias ;
Gardner, Eugene J. ;
Rodriguez, Oscar L. ;
Guo, Li ;
Collins, Ryan L. ;
Fan, Xian ;
Wen, Jia ;
Handsaker, Robert E. ;
Fairley, Susan ;
Kronenberg, Zev N. ;
Kong, Xiangmeng ;
Hormozdiari, Fereydoun ;
Lee, Dillon ;
Wenger, Aaron M. ;
Hastie, Alex R. ;
Antaki, Danny ;
Anantharaman, Thomas ;
Audano, Peter A. ;
Brand, Harrison ;
Cantsilieris, Stuart ;
Cao, Han ;
Cerveira, Eliza ;
Chen, Chong ;
Chen, Xintong ;
Chin, Chen-Shan ;
Chong, Zechen ;
Chuang, Nelson T. ;
Lambert, Christine C. ;
Church, Deanna M. ;
Clarke, Laura ;
Farrell, Andrew ;
Flores, Joey ;
Galeev, Timur ;
Gorkin, David U. ;
Gujral, Madhusudan ;
Guryev, Victor ;
Heaton, William Haynes ;
Korlach, Jonas ;
Kumar, Sushant ;
Kwon, Jee Young ;
Lam, Ernest T. ;
Lee, Jong Eun ;
Lee, Joyce ;
Lee, Wan-Ping ;
Lee, Sau Peng .
NATURE COMMUNICATIONS, 2019, 10 (1)
[3]  
Greenside P, 2016, BIORXIV
[4]   Structural variation in the sequencing era [J].
Ho, Steve S. ;
Urban, Alexander E. ;
Mills, Ryan E. .
NATURE REVIEWS GENETICS, 2020, 21 (03) :171-189
[5]   Software-Assisted Manual Review of Clinical Next-Generation Sequencing Data An Alternative to Routine Sanger Sequencing Confirmation with Equivalent Results in >15,000 Germline DNA Screens [J].
Muzzey, Dale ;
Kash, Shera ;
Johnson, Jillian I. ;
Melroy, Laura M. ;
Kaleta, Piotr ;
Pierce, Kelly A. ;
Ready, Kaylene ;
Kang, Hyunseok P. ;
Haas, Kevin R. .
JOURNAL OF MOLECULAR DIAGNOSTICS, 2019, 21 (02) :296-306
[6]   svviz: a read viewer for validating structural variants [J].
Spies, Noah ;
Zook, Justin M. ;
Salit, Marc ;
Sidow, Arend .
BIOINFORMATICS, 2015, 31 (24) :3994-3996
[7]   An integrated map of structural variation in 2,504 human genomes [J].
Sudmant, Peter H. ;
Rausch, Tobias ;
Gardner, Eugene J. ;
Handsaker, Robert E. ;
Abyzov, Alexej ;
Huddleston, John ;
Zhang, Yan ;
Ye, Kai ;
Jun, Goo ;
Fritz, Markus Hsi-Yang ;
Konkel, Miriam K. ;
Malhotra, Ankit ;
Stuetz, Adrian M. ;
Shi, Xinghua ;
Casale, Francesco Paolo ;
Chen, Jieming ;
Hormozdiari, Fereydoun ;
Dayama, Gargi ;
Chen, Ken ;
Malig, Maika ;
Chaisson, Mark J. P. ;
Walter, Klaudia ;
Meiers, Sascha ;
Kashin, Seva ;
Garrison, Erik ;
Auton, Adam ;
Lam, Hugo Y. K. ;
Mu, Xinmeng Jasmine ;
Alkan, Can ;
Antaki, Danny ;
Bae, Taejeong ;
Cerveira, Eliza ;
Chines, Peter ;
Chong, Zechen ;
Clarke, Laura ;
Dal, Elif ;
Ding, Li ;
Emery, Sarah ;
Fan, Xian ;
Gujral, Madhusudan ;
Kahveci, Fatma ;
Kidd, Jeffrey M. ;
Kong, Yu ;
Lameijer, Eric-Wubbo ;
McCarthy, Shane ;
Flicek, Paul ;
Gibbs, Richard A. ;
Marth, Gabor ;
Mason, Christopher E. ;
Menelaou, Androniki .
NATURE, 2015, 526 (7571) :75-+
[8]   A robust benchmark for detection of germline large deletions and insertions [J].
Zook, Justin M. ;
Hansen, Nancy F. ;
Olson, Nathan D. ;
Chapman, Lesley ;
Mullikin, James C. ;
Xiao, Chunlin ;
Sherry, Stephen ;
Koren, Sergey ;
Phillippy, Adam M. ;
Boutros, Paul C. ;
Sahraeian, Sayed Mohammad E. ;
Huang, Vincent ;
Rouette, Alexandre ;
Alexander, Noah ;
Mason, Christopher E. ;
Hajirasouliha, Iman ;
Ricketts, Camir ;
Lee, Joyce ;
Tearle, Rick ;
Fiddes, Ian T. ;
Barrio, Alvaro Martinez ;
Wala, Jeremiah ;
Carroll, Andrew ;
Ghaffari, Noushin ;
Rodriguez, Oscar L. ;
Bashir, Ali ;
Jackman, Shaun ;
Farrell, John J. ;
Wenger, Aaron M. ;
Alkan, Can ;
Soylev, Arda ;
Schatz, Michael C. ;
Garg, Shilpa ;
Church, George ;
Marschall, Tobias ;
Chen, Ken ;
Fan, Xian ;
English, Adam C. ;
Rosenfeld, Jeffrey A. ;
Zhou, Weichen ;
Mills, Ryan E. ;
Sage, Jay M. ;
Davis, Jennifer R. ;
Kaiser, Michael D. ;
Oliver, John S. ;
Catalano, Anthony P. ;
Chaisson, Mark J. P. ;
Spies, Noah ;
Sedlazeck, Fritz J. ;
Salit, Marc .
NATURE BIOTECHNOLOGY, 2020, 38 (11) :1347-+
[9]   Extensive sequencing of seven human genomes to characterize benchmark reference materials [J].
Zook, Justin M. ;
Catoe, David ;
McDaniel, Jennifer ;
Vang, Lindsay ;
Spies, Noah ;
Sidow, Arend ;
Weng, Ziming ;
Liu, Yuling ;
Mason, Christopher E. ;
Alexander, Noah ;
Henaff, Elizabeth ;
McIntyre, Alexa B. R. ;
Chandramohan, Dhruva ;
Chen, Feng ;
Jaeger, Erich ;
Moshrefi, Ali ;
Khoa Pham ;
Stedman, William ;
Liang, Tiffany ;
Saghbini, Michael ;
Dzakula, Zeljko ;
Hastie, Alex ;
Cao, Han ;
Deikus, Gintaras ;
Schadt, Eric ;
Sebra, Robert ;
Bashir, Ali ;
Truty, Rebecca M. ;
Chang, Christopher C. ;
Gulbahce, Natali ;
Zhao, Keyan ;
Ghosh, Srinka ;
Hyland, Fiona ;
Fu, Yutao ;
Chaisson, Mark ;
Xiao, Chunlin ;
Trow, Jonathan ;
Sherry, Stephen T. ;
Zaranek, Alexander W. ;
Ball, Madeleine ;
Bobe, Jason ;
Estep, Preston ;
Church, George M. ;
Marks, Patrick ;
Kyriazopoulou-Panagiotopoulou, Sofia ;
Zheng, Grace X. Y. ;
Schnall-Levin, Michael ;
Ordonez, Heather S. ;
Mudivarti, Patrice A. ;
Giorda, Kristina .
SCIENTIFIC DATA, 2016, 3