Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data

被引:8
作者
Kosugi, Shunichi [1 ,2 ,3 ,4 ]
Terao, Chikashi [3 ,4 ,5 ]
机构
[1] Res Org Informat & Syst, Ctr Genome Informat, Joint Support Ctr Data Sci Res, Shizuoka, Japan
[2] Natl Inst Genet, Adv Genom Ctr, Shizuoka, Japan
[3] RIKEN Ctr Integrat Med Sci, Lab Stat & Translat Genet, Yokohama, Kanagawa, Japan
[4] Shizuoka Prefectural Gen Hosp, Clin Res Ctr, Shizuoka, Japan
[5] Univ Shizuoka, Sch Pharmaceut Sci, Dept Appl Genet, Shizuoka, Japan
基金
日本学术振兴会;
关键词
DISCOVERY; VARIANTS; AWARE;
D O I
10.1038/s41439-024-00276-x
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Short- and long-read sequencing technologies are routinely used to detect DNA variants, including SNVs, indels, and structural variations (SVs). However, the differences in the quality and quantity of variants detected between short- and long-read data are not fully understood. In this study, we comprehensively evaluated the variant calling performance of short- and long-read-based SNV, indel, and SV detection algorithms (6 for SNVs, 12 for indels, and 13 for SVs) using a novel evaluation framework incorporating manual visual inspection. The results showed that indel-insertion calls greater than 10 bp were poorly detected by short-read-based detection algorithms compared to long-read-based algorithms; however, the recall and precision of SNV and indel-deletion detection were similar between short- and long-read data. The recall of SV detection with short-read-based algorithms was significantly lower in repetitive regions, especially for small- to intermediate-sized SVs, than that detected with long-read-based algorithms. In contrast, the recall and precision of SV detection in nonrepetitive regions were similar between short- and long-read data. These findings suggest the need for refined strategies, such as incorporating multiple variant detection algorithms, to generate a more complete set of variants using short-read data.
引用
收藏
页数:10
相关论文
共 50 条
[1]   A survey of algorithms for the detection of genomic structural variants from long-read sequencing data [J].
Ahsan, Mian Umair ;
Liu, Qian ;
Perdomo, Jonathan Elliot ;
Fang, Li ;
Wang, Kai .
NATURE METHODS, 2023, 20 (08) :1143-1158
[2]   NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks [J].
Ahsan, Mian Umair ;
Liu, Qian ;
Fang, Li ;
Wang, Kai .
GENOME BIOLOGY, 2021, 22 (01)
[3]   APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping [J].
Alkan, Can ;
Coe, Bradley P. ;
Eichler, Evan E. .
NATURE REVIEWS GENETICS, 2011, 12 (05) :363-375
[4]   Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery [J].
Barbitoff, Yury A. ;
Abasov, Ruslan ;
Tvorogova, Varvara E. ;
Glotov, Andrey S. ;
Predeus, Alexander V. .
BMC GENOMICS, 2022, 23 (01)
[5]   Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment [J].
Betschart, Raphael O. ;
Thiery, Alexandre ;
Aguilera-Garcia, Domingo ;
Zoche, Martin ;
Moch, Holger ;
Twerenbold, Raphael ;
Zeller, Tanja ;
Blankenberg, Stefan ;
Ziegler, Andreas .
SCIENTIFIC REPORTS, 2022, 12 (01)
[6]   Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software [J].
Cameron, Daniel L. ;
Di Stefano, Leon ;
Papenfuss, Anthony T. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[7]   GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly [J].
Cameron, Daniel L. ;
Schroder, Jan ;
Penington, Jocelyn Sietsma ;
Do, Hongdo ;
Molania, Ramyar ;
Dobrovic, Alexander ;
Speed, Terence P. ;
Papenfuss, Anthony T. .
GENOME RESEARCH, 2017, 27 (12) :2050-2060
[8]   Systematic comparison of germline variant calling pipelines cross multiple next-generation sequencers [J].
Chen, Jiayun ;
Li, Xingsong ;
Zhong, Hongbin ;
Meng, Yuhuan ;
Du, Hongli .
SCIENTIFIC REPORTS, 2019, 9 (1)
[9]   Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications [J].
Chen, Xiaoyu ;
Schulz-Trieglaff, Ole ;
Shaw, Richard ;
Barnes, Bret ;
Schlesinger, Felix ;
Kallberg, Morten ;
Cox, Anthony J. ;
Kruglyakl, Semyon ;
Saunders, Christopher T. .
BIOINFORMATICS, 2016, 32 (08) :1220-1222
[10]   Dysgu: efficient structural variant calling using short or long reads [J].
Cleal, Kez ;
Baird, Duncan M. .
NUCLEIC ACIDS RESEARCH, 2022, 50 (09) :E53