Comparison of three variant callers for human whole genome sequencing

被引:48
作者
Supernat, Anna [1 ,2 ]
Vidarsson, Oskar Valdimar [3 ]
Steen, Vidar M. [4 ,5 ,6 ]
Stokowy, Tomasz [3 ,4 ,5 ,6 ]
机构
[1] Univ Gdansk, Intercollegiate Fac Biotechnol, Lab Cell Biol, Gdansk, Poland
[2] Med Univ Gdansk, Gdansk, Poland
[3] Univ Bergen, Inst Informat, Computat Biol Unit, Bergen, Norway
[4] Univ Bergen, NORMENT, Bergen, Norway
[5] Univ Bergen, KJ Jebsen Ctr Psychosis Res, Dept Clin Sci, Bergen, Norway
[6] Haukeland Hosp, Dept Med Genet, Dr E Martens Res Grp Biol Psychiat, Bergen, Norway
来源
SCIENTIFIC REPORTS | 2018年 / 8卷
关键词
FRAMEWORK;
D O I
10.1038/s41598-018-36177-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Testing of patients with genetics-related disorders is in progress of shifting from single gene assays to gene panel sequencing, whole-exome sequencing (WES) and whole-genome sequencing (WGS). Since WGS is unquestionably becoming a new foundation for molecular analyses, we decided to compare three currently used tools for variant calling of human whole genome sequencing data. We tested DeepVariant, a new TensorFlow machine learning-based variant caller, and compared this tool to GATK 4.0 and SpeedSeq, using 30x, 15x and 10x WGS data of the well-known NA12878 DNA reference sample. According to our comparison, the performance on SNV calling was almost similar in 30x data, with all three variant callers reaching F-Scores (i.e. harmonic mean of recall and precision) equal to 0.98. In contrast, DeepVariant was more precise in indel calling than GATK and SpeedSeq, as demonstrated by F-Scores of 0.94, 0.90 and 0.84, respectively. We conclude that the DeepVariant tool has great potential and usefulness for analysis of WGS data in medical genetics.
引用
收藏
页数:6
相关论文
共 25 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] New insights into the generation and role of de novo mutations in health and disease
    Acuna-Hidalgo, Rocio
    Veltman, Joris A.
    Hoischen, Alexander
    [J]. GENOME BIOLOGY, 2016, 17
  • [3] [Anonymous], BIORXIV2017174201, DOI DOI 10.1101/174201
  • [4] [Anonymous], CREATING UNIVERSAL S
  • [5] Auffray C, 2016, GENOME MED, V8, DOI [10.1186/s13073-016-0265-4, 10.1186/s13073-016-0323-y]
  • [6] WHOLE GENOME SEQUENCING TO IDENTIFY GENETIC VARIANTS UNDERLYING CARDIOVASCULAR DISEASE AMONG INDIAN ASIANS
    Chambers, J. C.
    Tan, S. T.
    Zhang, W. H.
    Sehmi, J.
    Al-Hussaini, A.
    Ramasamy, M.
    Scott, J.
    Elliott, P.
    Kooner, J. S.
    [J]. HEART, 2012, 98 : A64 - A64
  • [7] Chiang C, 2015, NAT METHODS, V12, P966, DOI [10.1038/NMETH.3505, 10.1038/nmeth.3505]
  • [8] Whole Genome Sequencing as a Diagnostic Test: Challenges and Opportunities
    Chrystoja, Caitlin C.
    Diamandis, Eleftherios P.
    [J]. CLINICAL CHEMISTRY, 2014, 60 (05) : 724 - 733
  • [9] A framework for variation discovery and genotyping using next-generation DNA sequencing data
    DePristo, Mark A.
    Banks, Eric
    Poplin, Ryan
    Garimella, Kiran V.
    Maguire, Jared R.
    Hartl, Christopher
    Philippakis, Anthony A.
    del Angel, Guillermo
    Rivas, Manuel A.
    Hanna, Matt
    McKenna, Aaron
    Fennell, Tim J.
    Kernytsky, Andrew M.
    Sivachenko, Andrey Y.
    Cibulskis, Kristian
    Gabriel, Stacey B.
    Altshuler, David
    Daly, Mark J.
    [J]. NATURE GENETICS, 2011, 43 (05) : 491 - +
  • [10] Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
    do Valle, Italo Faria
    Giampieri, Enrico
    Simonetti, Giorgia
    Padella, Antonella
    Manfrini, Marco
    Ferrari, Anna
    Papayannidis, Cristina
    Zironi, Isabella
    Garonzi, Marianna
    Bernardi, Simona
    Delledonne, Massimo
    Martinelli, Giovanni
    Remondini, Daniel
    Castellani, Gastone
    [J]. BMC BIOINFORMATICS, 2016, 17