Variant detection sensitivity and biases in whole genome and exome sequencing

被引:155
作者
Meynert, Alison M. [1 ]
Ansari, Morad [1 ]
FitzPatrick, David R. [1 ]
Taylor, Martin S. [1 ]
机构
[1] Univ Edinburgh, Western Gen Hosp, MRC Inst Genet & Mol Med, MRC Human Genet Unit, Edinburgh EH4 2XU, Midlothian, Scotland
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
英国医学研究理事会;
关键词
SNP; Sensitivity; Protein-coding genes; Next-generation sequencing; Whole genome sequencing; Exome sequencing; DATA SETS; ACCURATE; FRAMEWORK;
D O I
10.1186/1471-2105-15-247
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions. Results: We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS. Conclusions: From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.
引用
收藏
页数:11
相关论文
共 24 条
  • [1] Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries
    Aird, Daniel
    Ross, Michael G.
    Chen, Wei-Sheng
    Danielsson, Maxwell
    Fennell, Timothy
    Russ, Carsten
    Jaffe, David B.
    Nusbaum, Chad
    Gnirke, Andreas
    [J]. GENOME BIOLOGY, 2011, 12 (02)
  • [2] Accurate and comprehensive sequencing of personal genomes
    Ajay, Subramanian S.
    Parker, Stephen C. J.
    Abaan, Hatice Ozel
    Fajardo, Karin V. Fuentes
    Margulies, Elliott H.
    [J]. GENOME RESEARCH, 2011, 21 (09) : 1498 - 1505
  • [3] A map of human genome variation from population-scale sequencing
    Altshuler, David
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Collins, Francis S.
    De la Vega, Francisco M.
    Donnelly, Peter
    Egholm, Michael
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Knoppers, Bartha M.
    Lander, Eric S.
    Lehrach, Hans
    Mardis, Elaine R.
    McVean, Gil A.
    Nickerson, DebbieA.
    Peltonen, Leena
    Schafer, Alan J.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Deiros, David
    Metzker, Mike
    Muzny, Donna
    Reid, Jeff
    Wheeler, David
    Wang, Jun
    Li, Jingxiang
    Jian, Min
    Li, Guoqing
    Li, Ruiqiang
    Liang, Huiqing
    Tian, Geng
    Wang, Bo
    Wang, Jian
    Wang, Wei
    Yang, Huanming
    Zhang, Xiuqing
    Zheng, Huisong
    Lander, Eric S.
    Altshuler, David L.
    Ambrogio, Lauren
    Bloom, Toby
    Cibulskis, Kristian
    Fennell, Tim J.
    Gabriel, Stacey B.
    [J]. NATURE, 2010, 467 (7319) : 1061 - 1073
  • [4] Integrating common and rare genetic variation in diverse human populations
    Altshuler, David M.
    Gibbs, Richard A.
    Peltonen, Leena
    Dermitzakis, Emmanouil
    Schaffner, Stephen F.
    Yu, Fuli
    Bonnen, Penelope E.
    de Bakker, Paul I. W.
    Deloukas, Panos
    Gabriel, Stacey B.
    Gwilliam, Rhian
    Hunt, Sarah
    Inouye, Michael
    Jia, Xiaoming
    Palotie, Aarno
    Parkin, Melissa
    Whittaker, Pamela
    Chang, Kyle
    Hawes, Alicia
    Lewis, Lora R.
    Ren, Yanru
    Wheeler, David
    Muzny, Donna Marie
    Barnes, Chris
    Darvishi, Katayoon
    Hurles, Matthew
    Korn, Joshua M.
    Kristiansson, Kati
    Lee, Charles
    McCarroll, Steven A.
    Nemesh, James
    Keinan, Alon
    Montgomery, Stephen B.
    Pollack, Samuela
    Price, Alkes L.
    Soranzo, Nicole
    Gonzaga-Jauregui, Claudia
    Anttila, Verneri
    Brodeur, Wendy
    Daly, Mark J.
    Leslie, Stephen
    McVean, Gil
    Moutsianas, Loukas
    Nguyen, Huy
    Zhang, Qingrun
    Ghori, Mohammed J. R.
    McGinnis, Ralph
    McLaren, William
    Takeuchi, Fumihiko
    Grossman, Sharon R.
    [J]. NATURE, 2010, 467 (7311) : 52 - 58
  • [5] Accurate whole human genome sequencing using reversible terminator chemistry
    Bentley, David R.
    Balasubramanian, Shankar
    Swerdlow, Harold P.
    Smith, Geoffrey P.
    Milton, John
    Brown, Clive G.
    Hall, Kevin P.
    Evers, Dirk J.
    Barnes, Colin L.
    Bignell, Helen R.
    Boutell, Jonathan M.
    Bryant, Jason
    Carter, Richard J.
    Cheetham, R. Keira
    Cox, Anthony J.
    Ellis, Darren J.
    Flatbush, Michael R.
    Gormley, Niall A.
    Humphray, Sean J.
    Irving, Leslie J.
    Karbelashvili, Mirian S.
    Kirk, Scott M.
    Li, Heng
    Liu, Xiaohai
    Maisinger, Klaus S.
    Murray, Lisa J.
    Obradovic, Bojan
    Ost, Tobias
    Parkinson, Michael L.
    Pratt, Mark R.
    Rasolonjatovo, Isabelle M. J.
    Reed, Mark T.
    Rigatti, Roberto
    Rodighiero, Chiara
    Ross, Mark T.
    Sabot, Andrea
    Sankar, Subramanian V.
    Scally, Aylwyn
    Schroth, Gary P.
    Smith, Mark E.
    Smith, Vincent P.
    Spiridou, Anastassia
    Torrance, Peta E.
    Tzonev, Svilen S.
    Vermaas, Eric H.
    Walter, Klaudia
    Wu, Xiaolin
    Zhang, Lu
    Alam, Mohammed D.
    Anastasi, Carole
    [J]. NATURE, 2008, 456 (7218) : 53 - 59
  • [6] Performance comparison of exome DNA sequencing technologies
    Clark, Michael J.
    Chen, Rui
    Lam, Hugo Y. K.
    Karczewski, Konrad J.
    Chen, Rong
    Euskirchen, Ghia
    Butte, Atul J.
    Snyder, Michael
    [J]. NATURE BIOTECHNOLOGY, 2011, 29 (10) : 908 - U206
  • [7] Cooper D.N., 1995, METABOLIC MOL BASES, V7th, P259
  • [8] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +
  • [9] Fast Computation and Applications of Genome Mappability
    Derrien, Thomas
    Estelle, Jordi
    Marco Sola, Santiago
    Knowles, David G.
    Raineri, Emanuele
    Guigo, Roderic
    Ribeca, Paolo
    [J]. PLOS ONE, 2012, 7 (01):
  • [10] Ensembl 2013
    Flicek, Paul
    Ahmed, Ikhlak
    Amode, M. Ridwan
    Barrell, Daniel
    Beal, Kathryn
    Brent, Simon
    Carvalho-Silva, Denise
    Clapham, Peter
    Coates, Guy
    Fairley, Susan
    Fitzgerald, Stephen
    Gil, Laurent
    Garcia-Giron, Carlos
    Gordon, Leo
    Hourlier, Thibaut
    Hunt, Sarah
    Juettemann, Thomas
    Kaehaeri, Andreas K.
    Keenan, Stephen
    Komorowska, Monika
    Kulesha, Eugene
    Longden, Ian
    Maurel, Thomas
    McLaren, William M.
    Muffato, Matthieu
    Nag, Rishi
    Overduin, Bert
    Pignatelli, Miguel
    Pritchard, Bethan
    Pritchard, Emily
    Riat, Harpreet Singh
    Ritchie, Graham R. S.
    Ruffier, Magali
    Schuster, Michael
    Sheppard, Daniel
    Sobral, Daniel
    Taylor, Kieron
    Thormann, Anja
    Trevanion, Stephen
    White, Simon
    Wilder, Steven P.
    Aken, Bronwen L.
    Birney, Ewan
    Cunningham, Fiona
    Dunham, Ian
    Harrow, Jennifer
    Herrero, Javier
    Hubbard, Tim J. P.
    Johnson, Nathan
    Kinsella, Rhoda
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D48 - D55