Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data

被引:12
|
作者
Duan, Xiaoke [1 ,2 ]
Pan, Mingpei [1 ,2 ]
Fan, Shaohua [1 ]
机构
[1] Fudan Univ, Zhangjiang Fudan Int Innovat Ctr, Human Phenome Inst, State Key Lab Genet Engn, Shanghai 200438, Peoples R China
[2] Fudan Univ, Sch Life Sci, Dept Anthropol & Human Genet, MOE Key Lab Contemporary Anthropol, Shanghai 200433, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Long-read sequencing; SV genotyping; F1; score; Performance evaluation; EVOLUTION; SELECTION; MUTATION; IMPACT;
D O I
10.1186/s12864-022-08548-y
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background Structural variants (SVs) play a crucial role in gene regulation, trait association, and disease in humans. SV genotyping has been extensively applied in genomics research and clinical diagnosis. Although a growing number of SV genotyping methods for long reads have been developed, a comprehensive performance assessment of these methods has yet to be done. Results Based on one simulated and three real SV datasets, we performed an in-depth evaluation of five SV genotyping methods, including cuteSV, LRcaller, Sniffles, SVJedi, and VaPoR. The results show that for insertions and deletions, cuteSV and LRcaller have similar F1 scores (cuteSV, insertions: 0.69-0.90, deletions: 0.77-0.90 and LRcaller, insertions: 0.67-0.87, deletions: 0.74-0.91) and are superior to other methods. For duplications, inversions, and translocations, LRcaller yields the most accurate genotyping results (0.84, 0.68, and 0.47, respectively). When genotyping SVs located in tandem repeat region or with imprecise breakpoints, cuteSV (insertions and deletions) and LRcaller (duplications, inversions, and translocations) are better than other methods. In addition, we observed a decrease in F1 scores when the SV size increased. Finally, our analyses suggest that the F1 scores of these methods reach the point of diminishing returns at 20x depth of coverage. Conclusions We present an in-depth benchmark study of long-read SV genotyping methods. Our results highlight the advantages and disadvantages of each genotyping method, which provide practical guidance for optimal application selection and prospective directions for tool improvement.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
    Xiaoke Duan
    Mingpei Pan
    Shaohua Fan
    BMC Genomics, 23
  • [2] Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
    Yichen Henry Liu
    Can Luo
    Staunton G. Golding
    Jacob B. Ioffe
    Xin Maizie Zhou
    Nature Communications, 15
  • [3] Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
    Liu, Yichen Henry
    Luo, Can
    Golding, Staunton G.
    Ioffe, Jacob B.
    Zhou, Xin Maizie
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [4] Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation
    Jiang, Tao
    Liu, Shiqi
    Cao, Shuqi
    Liu, Yadong
    Cui, Zhe
    Wang, Yadong
    Guo, Hongzhe
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [5] Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation
    Tao Jiang
    Shiqi Liu
    Shuqi Cao
    Yadong Liu
    Zhe Cui
    Yadong Wang
    Hongzhe Guo
    BMC Bioinformatics, 22
  • [6] SVLR: Genome Structural Variant Detection Using Long-Read Sequencing Data
    Gu, Wenyan
    Zhou, Aizhong
    Wang, Lusheng
    Sun, Shiwei
    Cui, Xuefeng
    Zhu, Daming
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2021, 28 (08) : 774 - 788
  • [7] Comprehensive assessment of mRNA isoform detection methods for long-read sequencing data
    Su, Yaqi
    Yu, Zhejian
    Jin, Siqian
    Ai, Zhipeng
    Yuan, Ruihong
    Chen, Xinyi
    Xue, Ziwei
    Guo, Yixin
    Chen, Di
    Liang, Hongqing
    Liu, Zuozhu
    Liu, Wanlu
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [8] The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data
    Lesack, Kyle J.
    Wasmuth, James D.
    PEERJ, 2024, 12 : 1 - 19
  • [9] Long-read genotyping with SLANG (Simple Long-read loci Assembly of Nanopore data for Genotyping)
    Dorfner, Marco
    Ott, Tankred
    Ott, Philipp
    Oberprieler, Christoph
    APPLICATIONS IN PLANT SCIENCES, 2022, 10 (03):
  • [10] Population-scale genotyping of structural variation in the era of long-read sequencing
    Quan, Cheng
    Lu, Hao
    Lu, Yiming
    Zhou, Gangqiao
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 2639 - 2647