Benchmarking variant callers in next-generation and third-generation sequencing analysis

被引:60
作者
Pei, Surui [1 ,2 ]
Liu, Tao [2 ]
Ren, Xue [2 ]
Li, Weizhong [3 ]
Chen, Chongjian [2 ]
Xie, Zhi [4 ]
机构
[1] Sun Yat Sen Univ, Zhongshan Ophthalm Ctr, Guangzhou, Peoples R China
[2] Annoroad Gene Technol Beijing Co Ltd, Beijing 100176, Peoples R China
[3] Sun Yat Sen Univ, Zhongshan Sch Med, Guangzhou, Peoples R China
[4] Sun Yat Sen Univ, Zhongshan Ophthalm Ctr, Bioinformat, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
variant callers; germline variant; somatic variant;
D O I
10.1093/bib/bbaa148
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
DNA variants represent an important source of genetic variations among individuals. Next- generation sequencing (NGS) is the most popular technology for genome-wide variant calling. Third-generation sequencing (TGS) has also recently been used in genetic studies. Although many variant callers are available, no single caller can call both types of variants on NGS or TGS data with high sensitivity and specificity. In this study, we systematically evaluated 11 variant callers on 12 NGS and TGS datasets. For germline variant calling, we tested DNAseq and DNAscope modes from Sentieon, HaplotypeCaller mode from GATK and WGS mode from DeepVariant. All the four callers had comparable performance on NGS data and 30x coverage of WGS data was recommended. For germline variant calling on TGS data, we tested DNAseq mode from Sentieon, HaplotypeCaller mode from GATK and PACBIO mode from DeepVariant. All the three callers had similar performance in SNP calling, while DeepVariant outperformed the others in InDel calling. TGS detected more variants than NGS, particularly in complex and repetitive regions. For somatic variant calling on NGS, we tested TNscope and TNseq modes from Sentieon, MuTect2 mode from GATK, NeuSomatic, VarScan2, and Strelka2. TNscope and Mutect2 outperformed the other callers. A higher proportion of tumor sample purity (from 10 to 20%) significantly increased the recall value of calling. Finally, computational costs of the callers were compared and Sentieon required the least computational cost. These results suggest that careful selection of a tool and parameters is needed for accurate SNP or InDel calling under different scenarios.
引用
收藏
页数:11
相关论文
共 25 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]  
[Anonymous], 2015, BIORXIV, DOI DOI 10.1101/023754
[3]   Whole-exome sequencing as a diagnostic tool for distal renal tubular acidosis [J].
Barros Pereira, Paula Cristina ;
Melo, Flavia Medeiros ;
Cunha De Marco, Luiz Armando ;
Oliveira, Eduardo Araujo ;
Miranda, Debora Marques ;
Simoes e Silva, Ana Cristina .
JORNAL DE PEDIATRIA, 2015, 91 (06) :583-589
[4]   Comparing the performance of selected variant callers using synthetic data and genome segmentation [J].
Bian, Xiaopeng ;
Zhu, Bin ;
Wang, Mingyi ;
Hu, Ying ;
Chen, Qingrong ;
Nguyen, Cu ;
Hicks, Belynda ;
Meerzaman, Daoud .
BMC BIOINFORMATICS, 2018, 19
[5]   Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers [J].
Chen, Jiayun ;
Li, Xingsong ;
Zhong, Hongbin ;
Meng, Yuhuan ;
Du, Hongli .
SCIENTIFIC REPORTS, 2019, 9 (1)
[6]   Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples [J].
Cibulskis, Kristian ;
Lawrence, Michael S. ;
Carter, Scott L. ;
Sivachenko, Andrey ;
Jaffe, David ;
Sougnez, Carrie ;
Gabriel, Stacey ;
Meyerson, Matthew ;
Lander, Eric S. ;
Getz, Gad .
NATURE BIOTECHNOLOGY, 2013, 31 (03) :213-219
[7]   Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing [J].
Edge, Peter ;
Bansal, Vikas .
NATURE COMMUNICATIONS, 2019, 10 (1)
[8]  
Freed D., 2018, bioRxiv, DOI DOI 10.1101/250647
[9]  
Freed D, 2017, bioRxiv
[10]  
Griffiths A., 2000, INTRO GENETIC ANAL