Medical implications of technical accuracy in genome sequencing

被引:97
作者
Goldfeder, Rachel L. [1 ,3 ]
Priest, James R. [3 ,5 ]
Zook, Justin M. [2 ]
Grove, Megan E. [1 ,3 ]
Waggott, Daryl [1 ,3 ]
Wheeler, Matthew T. [1 ,3 ]
Salit, Marc [2 ]
Ashley, Euan A. [1 ,3 ,4 ]
机构
[1] Stanford Univ, Dept Med, Stanford, CA 94305 USA
[2] NIST, Genome Scale Measurements Grp, Gaithersburg, MD 20899 USA
[3] Stanford Univ, Stanford Ctr Inherited Cardiovasc Dis, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Genet, Stanford, CA 94305 USA
[5] Stanford Univ, Dept Pediat, Stanford, CA 94305 USA
关键词
WHOLE-EXOME CAPTURE; CLINICAL EXOME; DE-NOVO; IDENTIFICATION; VARIANTS; MUTATION; SENSITIVITY; PERFORMANCE; DIAGNOSIS; PATIENT;
D O I
10.1186/s13073-016-0269-0
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a set of benchmark SNV, indel, and homozygous reference genotypes for the pilot whole genome NIST Reference Material based on the NA12878 genome. Methods: We examine the relationship between human genome complexity and genes/variants reported to be associated with human disease. Specifically, we map regions of medical relevance to benchmark regions of high or low confidence. We use benchmark data to assess the sensitivity and positive predictive value of two representative sequencing pipelines for specific classes of variation. Results: We observe that the accuracy of a variant call depends on the genomic region, variant type, and read depth, and varies by analytical pipeline. We find that most false negative WGS calls result from filtering while most false negative WES variants relate to poor coverage. We find that only 74.6 % of the exonic bases in ClinVar and OMIM genes and 82.1 % of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Only 990 genes in the genome are found entirely within high-confidence regions while 593 of 3,300 ClinVar/OMIM genes have less than 50 % of their total exonic base pairs in high-confidence regions. We find greater than 77 % of the pathogenic or likely pathogenic SNVs currently in ClinVar fall within high-confidence regions. We identify sites that are prone to sequencing errors, including thousands present in publicly available variant databases. Finally, we examine the clinical impact of mandatory reporting of secondary findings, highlighting a false positive variant found in BRCA2. Conclusions: Together, these data illustrate the importance of appropriate use and continued improvement of technical benchmarks to ensure accurate and judicious interpretation of next-generation DNA sequencing results in the clinical setting.
引用
收藏
页数:12
相关论文
共 49 条
[41]   Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units [J].
Saunders, Carol Jean ;
Miller, Neil Andrew ;
Soden, Sarah Elizabeth ;
Dinwiddie, Darrell Lee ;
Noll, Aaron ;
Abu Alnadi, Noor ;
Andraws, Nevene ;
Patterson, Melanie LeAnn ;
Krivohlavek, Lisa Ann ;
Fellis, Joel ;
Humphray, Sean ;
Saffrey, Peter ;
Kingsbury, Zoya ;
Weir, Jacqueline Claire ;
Betley, Jason ;
Grocock, Russell James ;
Margulies, Elliott Harrison ;
Farrow, Emily Gwendolyn ;
Artman, Michael ;
Safina, Nicole Pauline ;
Petrikin, Joshua Erin ;
Hall, Kevin Peter ;
Kingsmore, Stephen Francis .
SCIENCE TRANSLATIONAL MEDICINE, 2012, 4 (154)
[42]   Shotgun sequence assembly and recent segmental duplications within the human genome [J].
She, XW ;
Jiang, ZX ;
Clark, RL ;
Liu, G ;
Cheng, Z ;
Tuzun, E ;
Church, DM ;
Sutton, G ;
Halpern, AL ;
Eichler, EE .
NATURE, 2004, 431 (7011) :927-930
[43]   Effectiveness of exome and genome sequencing guided by acuity of illness for diagnosis of neurodevelopmental disorders [J].
Soden, Sarah E. ;
Saunders, Carol J. ;
Willig, Laurel K. ;
Farrow, Emily G. ;
Smith, Laurie D. ;
Petrikin, Josh E. ;
LePichon, Jean-Baptiste ;
Miller, Neil A. ;
Thiffault, Isabelle ;
Dinwiddie, Darrell L. ;
Twist, Greyson ;
Noll, Aaron ;
Heese, Bryce A. ;
Zellmer, Lee ;
Atherton, Andrea M. ;
Abdelmoity, Ahmed T. ;
Safina, Nicole ;
Nyp, Sarah S. ;
Zuccarelli, Britton ;
Larson, Ingrid A. ;
Modrcin, Ann ;
Herd, Suzanne ;
Creed, Mitchell ;
Ye, Zhaohui ;
Yuan, Xuan ;
Brodsky, Robert A. ;
Kingsmore, Stephen F. .
SCIENCE TRANSLATIONAL MEDICINE, 2014, 6 (265)
[44]   Alternative Promoter Usage and Splicing of the Human SCN5A Gene Contribute to Transcript Heterogeneity [J].
van Stuijvenberg, Leonie ;
Yildirim, Cansu ;
Kok, Bart G. J. M. ;
van Veen, Toon A. B. ;
Varro, Andras ;
Winckels, Stephan K. G. ;
Vos, Marc A. ;
Bierhuizen, Marti F. A. .
DNA AND CELL BIOLOGY, 2010, 29 (10) :577-587
[45]   Estimating genotype error rates from high-coverage next-generation sequence data [J].
Wall, Jeffrey D. ;
Tang, Ling Fung ;
Zerbe, Brandon ;
Kvale, Mark N. ;
Kwok, Pui-Yan ;
Schaefer, Catherine ;
Risch, Neil .
GENOME RESEARCH, 2014, 24 (11) :1734-1739
[46]   ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data [J].
Wang, Kai ;
Li, Mingyao ;
Hakonarson, Hakon .
NUCLEIC ACIDS RESEARCH, 2010, 38 (16) :e164
[47]   Whole-genome sequencing for identification of Mendelian disorders in critically ill infants: a retrospective analysis of diagnostic and clinical findings [J].
Willig, Laurel K. ;
Petrikin, Josh E. ;
Smith, Laurie D. ;
Saunders, Carol J. ;
Thiffault, Isabelle ;
Miller, Neil A. ;
Soden, Sarah E. ;
Cakici, Julie A. ;
Herd, Suzanne M. ;
Twist, Greyson ;
Noll, Aaron ;
Creed, Mitchell ;
Alba, Patria M. ;
Carpenter, Shannon L. ;
Clements, Mark A. ;
Fischer, Ryan T. ;
Hays, J. Allyson ;
Kilbride, Howard ;
McDonough, Ryan J. ;
Rosterman, Jamie L. ;
Tsai, Sarah L. ;
Zellmer, Lee ;
Farrow, Emily G. ;
Kingsmore, Stephen F. .
LANCET RESPIRATORY MEDICINE, 2015, 3 (05) :377-387
[48]   Comparison of somatic mutation calling methods in amplicon and whole exome sequence data [J].
Xu, Huilei ;
DiCarlo, John ;
Satya, Ravi Vijaya ;
Peng, Quan ;
Wang, Yexun .
BMC GENOMICS, 2014, 15
[49]   Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls [J].
Zook, Justin M. ;
Chapman, Brad ;
Wang, Jason ;
Mittelman, David ;
Hofmann, Oliver ;
Hide, Winston ;
Salit, Marc .
NATURE BIOTECHNOLOGY, 2014, 32 (03) :246-251