Performance analysis of conventional and AI-based variant callers using short and long reads

被引:9
作者
Abdelwahab, Omar [1 ,2 ,3 ,4 ]
Belzile, Francois [1 ,2 ,3 ]
Torkamaneh, Davoud [1 ,2 ,3 ,4 ]
机构
[1] Univ Laval, Dept Phytol, Quebec City, PQ, Canada
[2] Univ Laval, Inst Biol Integrat & Syst IBIS, Quebec City, PQ, Canada
[3] Univ Laval, Ctr Rech & Innovat Vegetaux CRIV, Quebec City, PQ, Canada
[4] Univ Laval, Inst Intelligence & Donnees IID, Quebec City, PQ, Canada
关键词
Genomics; Sequencing; Variant calling; NGS; Artificial intelligence;
D O I
10.1186/s12859-023-05596-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundThe accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use.ResultsIn this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons.ConclusionThis study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
引用
收藏
页数:13
相关论文
共 54 条
[1]  
allofus.nih, All of Us Research Program
[2]   An integrated map of genetic variation from 1,092 human genomes [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Schmidt, Jeanette P. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Dinh, Huyen ;
Kovar, Christie ;
Lee, Sandra ;
Lewis, Lora ;
Muzny, Donna ;
Reid, Jeff ;
Wang, Min ;
Wang, Jun ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Li, Zhuo ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Su, Zhe ;
Tai, Shuaishuai ;
Tang, Meifang .
NATURE, 2012, 491 (7422) :56-65
[3]   A reference standard for genome biology [J].
不详 .
NATURE BIOTECHNOLOGY, 2018, 36 (12) :1121-1121
[4]  
[Anonymous], 2017, bioRxiv, DOI DOI 10.1101/115717
[5]   Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery [J].
Barbitoff, Yury A. ;
Abasov, Ruslan ;
Tvorogova, Varvara E. ;
Glotov, Andrey S. ;
Predeus, Alexander V. .
BMC GENOMICS, 2022, 23 (01)
[6]   In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data [J].
Cai, Lei ;
Yuan, Wei ;
Zhang, Zhou ;
He, Lin ;
Chou, Kuo-Chen .
SCIENTIFIC REPORTS, 2016, 6
[7]   Intersect-then-combine approach: improving the performance of somatic variant calling in whole exome sequencing data using multiple aligners and callers [J].
Callari, Maurizio ;
Sammut, Stephen-John ;
De Mattos-Arruda, Leticia ;
Bruna, Alejandra ;
Rueda, Oscar M. ;
Chin, Suet-Feung ;
Caldas, Carlos .
GENOME MEDICINE, 2017, 9
[8]   The USDA-ARS Ag100Pest Initiative: High-Quality Genome Assemblies for Agricultural Pest Arthropod Research [J].
Childers, Anna K. ;
Geib, Scott M. ;
Sim, Sheina B. ;
Poelchau, Monica F. ;
Coates, Brad S. ;
Simmonds, Tyler J. ;
Scully, Erin D. ;
Smith, Timothy P. L. ;
Childers, Christopher P. ;
Corpuz, Renee L. ;
Hackett, Kevin ;
Scheffler, Brian .
INSECTS, 2021, 12 (07)
[9]   Twelve years of SAMtools and BCFtools [J].
Danecek, Petr ;
Bonfield, James K. ;
Liddle, Jennifer ;
Marshall, John ;
Ohan, Valeriu ;
Pollard, Martin O. ;
Whitwham, Andrew ;
Keane, Thomas ;
McCarthy, Shane A. ;
Davies, Robert M. ;
Li, Heng .
GIGASCIENCE, 2021, 10 (02)
[10]   The i5K Initiative:Advancing Arthropod Genomics for Knowledge, Human Health,Agriculture, and the Environment i5K CONSORTIUM [J].
Evans, Jay D. ;
Brown, Susan J. ;
Hackett, Kevin J. ;
Robinson, Gene ;
Richards, Stephen ;
Lawson, Daniel ;
Elsik, Christine ;
Coddington, Jonathan ;
Edwards, Owain ;
Emrich, Scott ;
Gabaldon, Toni ;
Goldsmith, Marian ;
Hanes, Glenn ;
Misof, Bernard ;
Munoz-Torres, Monica ;
Niehuis, Oliver ;
Papanicolaou, Alexie ;
Pfrender, Michael ;
Poelchau, Monica ;
Purcell-Miramontes, Mary ;
Robertson, Hugh M. ;
Ryder, Oliver ;
Tagu, Denis ;
Torres, Tatiana ;
Zdobnov, Evgeny ;
Zhang, Guojie ;
Zhou, Xin .
JOURNAL OF HEREDITY, 2013, 104 (05) :595-600