Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

被引:310
作者
O'Rawe, Jason [1 ,2 ]
Jiang, Tao [3 ]
Sun, Guangqing [3 ]
Wu, Yiyang [1 ,2 ]
Wang, Wei [4 ]
Hu, Jingchu [3 ]
Bodily, Paul [5 ]
Tian, Lifeng [6 ]
Hakonarson, Hakon [6 ]
Johnson, W. Evan [7 ]
Wei, Zhi [4 ]
Wang, Kai [8 ,9 ]
Lyon, Gholson J. [1 ,2 ,9 ]
机构
[1] Cold Spring Harbor Lab, Stanley Inst Cognit Genom, Cold Spring Harbor, NY 11724 USA
[2] SUNY Stony Brook, Stony Brook, NY 11794 USA
[3] BGI Shenzhen, Shenzhen 518000, Peoples R China
[4] New Jersey Inst Technol, Newark, NJ 07103 USA
[5] Brigham Young Univ, Provo, UT 84606 USA
[6] Childrens Hosp Philadelphia, Philadelphia, PA 19104 USA
[7] Boston Univ, Sch Med, Boston, MA 02118 USA
[8] Univ So Calif, Los Angeles, CA 90089 USA
[9] Utah Fdn Biomed Res, Salt Lake City, UT 84106 USA
来源
GENOME MEDICINE | 2013年 / 5卷
关键词
DE-NOVO MUTATIONS; GENOTYPE IMPUTATION; SMALL INSERTIONS; ASSOCIATION; FRAMEWORK; DELETIONS; ALIGNMENT; RATES; TOOL;
D O I
10.1186/gm432
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be. Methods: We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage. Results: SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family. Conclusions: Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.
引用
收藏
页数:18
相关论文
共 58 条
  • [1] An integrated map of genetic variation from 1,092 human genomes
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Schmidt, Jeanette P.
    Sherry, Stephen T.
    Wang, Jun
    Wilson, Richard K.
    Gibbs, Richard A.
    Dinh, Huyen
    Kovar, Christie
    Lee, Sandra
    Lewis, Lora
    Muzny, Donna
    Reid, Jeff
    Wang, Min
    Wang, Jun
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Li, Zhuo
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Su, Zhe
    Tai, Shuaishuai
    Tang, Meifang
    [J]. NATURE, 2012, 491 (7422) : 56 - 65
  • [2] A public resource facilitating clinical use of genomes
    Ball, Madeleine P.
    Thakuria, Joseph V.
    Zaranek, Alexander Wait
    Clegg, Tom
    Rosenbaum, Abraham M.
    Wu, Xiaodi
    Angrist, Misha
    Bhak, Jong
    Bobe, Jason
    Callow, Matthew J.
    Cano, Carlos
    Chou, Michael F.
    Chung, Wendy K.
    Douglas, Shawn M.
    Estep, Preston W.
    Gore, Athurva
    Hulick, Peter
    Labarga, Alberto
    Lee, Je-Hyuk
    Lunshof, Jeantine E.
    Kim, Byung Chul
    Kim, Jong-Il
    Li, Zhe
    Murray, Michael F.
    Nilsen, Geoffrey B.
    Peters, Brock A.
    Raman, Anugraha M.
    Rienhoff, Hugh Y.
    Robasky, Kimberly
    Wheeler, Matthew T.
    Vandewege, Ward
    Vorhaus, Daniel B.
    Yang, Joyce L.
    Yang, Luhan
    Aach, John
    Ashley, Euan A.
    Drmanac, Radoje
    Kim, Seong-Jin
    Li, Jin Billy
    Peshkin, Leonid
    Seidman, Christine E.
    Seo, Jeong-Sun
    Zhang, Kun
    Rehm, Heidi L.
    Church, George M.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (30) : 11920 - 11927
  • [3] Bearn A.G., 1993, Archibald Garrod and the individuality of Man
  • [4] A Fast, Powerful Method for Detecting Identity by Descent
    Browning, Brian L.
    Browning, Sharon R.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2011, 88 (02) : 173 - 182
  • [5] Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
    Browning, Sharon R.
    Browning, Brian L.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2007, 81 (05) : 1084 - 1097
  • [6] Computational Techniques for Human Genome Resequencing Using Mated Gapped Reads
    Carnevali, Paolo
    Baccash, Jonathan
    Halpern, Aaron L.
    Nazarenko, Igor
    Nilsen, Geoffrey B.
    Pant, Krishna P.
    Ebert, Jessica C.
    Brownley, Anushka
    Morenzoni, Matt
    Karpinchyk, Vitali
    Martin, Bruce
    Ballinger, Dennis G.
    Drmanac, Radoje
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (03) : 279 - 292
  • [7] Whole-Exome Sequencing and Homozygosity Analysis Implicate Depolarization-Regulated Neuronal Genes in Autism
    Chahrour, Maria H.
    Yu, Timothy W.
    Lim, Elaine T.
    Ataman, Bulent
    Coulter, Michael E.
    Hill, R. Sean
    Stevens, Christine R.
    Schubert, Christian R.
    Greenberg, Michael E.
    Gabriel, Stacey B.
    Walsh, Christopher A.
    [J]. PLOS GENETICS, 2012, 8 (04): : 236 - 244
  • [8] The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing
    Clement, Nathan L.
    Snell, Quinn
    Clement, Mark J.
    Hollenhorst, Peter C.
    Purwar, Jahnvi
    Graves, Barbara J.
    Cairns, Bradley R.
    Johnson, W. Evan
    [J]. BIOINFORMATICS, 2010, 26 (01) : 38 - 45
  • [9] Variation in genome-wide mutation rates within and between human families
    Conrad, Donald F.
    Keebler, Jonathan E. M.
    DePristo, Mark A.
    Lindsay, Sarah J.
    Zhang, Yujun
    Casals, Ferran
    Idaghdour, Youssef
    Hartl, Chris L.
    Torroja, Carlos
    Garimella, Kiran V.
    Zilversmit, Martine
    Cartwright, Reed
    Rouleau, Guy A.
    Daly, Mark
    Stone, Eric A.
    Hurles, Matthew E.
    Awadalla, Philip
    [J]. NATURE GENETICS, 2011, 43 (07) : 712 - U137
  • [10] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +