Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data

被引:12
|
作者
Duan, Xiaoke [1 ,2 ]
Pan, Mingpei [1 ,2 ]
Fan, Shaohua [1 ]
机构
[1] Fudan Univ, Zhangjiang Fudan Int Innovat Ctr, Human Phenome Inst, State Key Lab Genet Engn, Shanghai 200438, Peoples R China
[2] Fudan Univ, Sch Life Sci, Dept Anthropol & Human Genet, MOE Key Lab Contemporary Anthropol, Shanghai 200433, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Long-read sequencing; SV genotyping; F1; score; Performance evaluation; EVOLUTION; SELECTION; MUTATION; IMPACT;
D O I
10.1186/s12864-022-08548-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Structural variants (SVs) play a crucial role in gene regulation, trait association, and disease in humans. SV genotyping has been extensively applied in genomics research and clinical diagnosis. Although a growing number of SV genotyping methods for long reads have been developed, a comprehensive performance assessment of these methods has yet to be done. Results Based on one simulated and three real SV datasets, we performed an in-depth evaluation of five SV genotyping methods, including cuteSV, LRcaller, Sniffles, SVJedi, and VaPoR. The results show that for insertions and deletions, cuteSV and LRcaller have similar F1 scores (cuteSV, insertions: 0.69-0.90, deletions: 0.77-0.90 and LRcaller, insertions: 0.67-0.87, deletions: 0.74-0.91) and are superior to other methods. For duplications, inversions, and translocations, LRcaller yields the most accurate genotyping results (0.84, 0.68, and 0.47, respectively). When genotyping SVs located in tandem repeat region or with imprecise breakpoints, cuteSV (insertions and deletions) and LRcaller (duplications, inversions, and translocations) are better than other methods. In addition, we observed a decrease in F1 scores when the SV size increased. Finally, our analyses suggest that the F1 scores of these methods reach the point of diminishing returns at 20x depth of coverage. Conclusions We present an in-depth benchmark study of long-read SV genotyping methods. Our results highlight the advantages and disadvantages of each genotyping method, which provide practical guidance for optimal application selection and prospective directions for tool improvement.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Discovery and genotyping of structural variation from long-read haploid genome sequence data
    Huddleston, John
    Chaisson, Mark J. P.
    Steinberg, Karyn Meltz
    Warren, Wes
    Hoekzema, Kendra
    Gordon, David
    Graves-Lindsay, Tina A.
    Munson, Katherine M.
    Kronenberg, Zev N.
    Vives, Laura
    Peluso, Paul
    Boitano, Matthew
    Chin, Chen-Shin
    Korlach, Jonas
    Wilson, Richard K.
    Eichler, Evan E.
    GENOME RESEARCH, 2017, 27 (05) : 677 - 685
  • [22] Comparative evaluation of SNVs, indels, and structural variations detected with short- and long-read sequencing data
    Kosugi, Shunichi
    Terao, Chikashi
    HUMAN GENOME VARIATION, 2024, 11 (01)
  • [23] Sequali: efficient and comprehensive quality control of short- and long-read sequencing data
    Vorderman, Ruben H. P.
    BIOINFORMATICS ADVANCES, 2025, 5 (01):
  • [24] Unraveling metagenomics through long-read sequencing: a comprehensive review
    Kim, Chankyung
    Pongpanich, Monnat
    Porntaveetus, Thantrira
    JOURNAL OF TRANSLATIONAL MEDICINE, 2024, 22 (01)
  • [25] Unraveling metagenomics through long-read sequencing: a comprehensive review
    Chankyung Kim
    Monnat Pongpanich
    Thantrira Porntaveetus
    Journal of Translational Medicine, 22
  • [26] HAT: de novo variant calling for highly accurate short-read and long-read sequencing data
    Ng, Jeffrey K.
    Turner, Tychele N.
    BIOINFORMATICS, 2024, 40 (01)
  • [27] Opportunities and challenges in long-read sequencing data analysis
    Shanika L. Amarasinghe
    Shian Su
    Xueyi Dong
    Luke Zappia
    Matthew E. Ritchie
    Quentin Gouil
    Genome Biology, 21
  • [28] NanoPack: visualizing and processing long-read sequencing data
    De Coster, Wouter
    D'Hert, Svenn
    Schultz, Darrin T.
    Cruts, Marc
    Van Broeckhoven, Christine
    BIOINFORMATICS, 2018, 34 (15) : 2666 - 2669
  • [29] Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing
    Cook, David E.
    Valle-Inclan, Jose Espejo
    Pajoro, Alice
    Rovenich, Hanna
    Thomma, Bart P. H. J.
    Faino, Luigi
    PLANT PHYSIOLOGY, 2019, 179 (01) : 38 - 54
  • [30] Opportunities and challenges in long-read sequencing data analysis
    Amarasinghe, Shanika L.
    Su, Shian
    Dong, Xueyi
    Zappia, Luke
    Ritchie, Matthew E.
    Gouil, Quentin
    GENOME BIOLOGY, 2020, 21 (01)