SVsearcher: A more accurate structural variation detection method in long read data

被引:3
|
作者
Zheng, Yan [1 ]
Shang, Xuequn [1 ]
Sung, Wing-Kin [2 ,3 ,4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, West Youyi Rd 127, Xian 710072, Peoples R China
[2] Chinese Univ Hong Kong, Dept Chem Pathol, Hong Kong, Peoples R China
[3] Hong Kong Genome Inst, Shatin, Hong Kong Sci Pk, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Li Ka Shing Inst Hlth Sci, Lab Computat Genom, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Long-read sequencing data; Structural variations; SV detection; PAIRED-END; VARIANTS; IMPACT; INDELS; CANCER;
D O I
10.1016/j.compbiomed.2023.106843
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50x) datasets and more than 25% for low coverage (10x) datasets. More importantly, SVsearcher can identify 81.7%-91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)-54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [1] SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data
    Yan Zheng
    Xuequn Shang
    BMC Bioinformatics, 24
  • [2] SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data
    Zheng, Yan
    Shang, Xuequn
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [3] On detection of somatic structural variation in highly repetitive regions using long-read sequencing data
    Shiraishi, Yuichi
    CANCER SCIENCE, 2024, 115 : 31 - 31
  • [4] Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data
    Helal, Asmaa A.
    Saad, Bishoy T.
    Saad, Mina T.
    Mosaad, Gamal S.
    Aboshanab, Khaled M.
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [5] Benchmarking long-read aligners and SV callers for structural variation detection in Oxford nanopore sequencing data
    Asmaa A. Helal
    Bishoy T. Saad
    Mina T. Saad
    Gamal S. Mosaad
    Khaled M. Aboshanab
    Scientific Reports, 14
  • [6] Long-read-based human genomic structural variation detection with cuteSV
    Jiang, Tao
    Liu, Yongzhuang
    Jiang, Yue
    Li, Junyi
    Gao, Yan
    Cui, Zhe
    Liu, Yadong
    Liu, Bo
    Wang, Yadong
    GENOME BIOLOGY, 2020, 21 (01)
  • [7] Long-read-based human genomic structural variation detection with cuteSV
    Tao Jiang
    Yongzhuang Liu
    Yue Jiang
    Junyi Li
    Yan Gao
    Zhe Cui
    Yadong Liu
    Bo Liu
    Yadong Wang
    Genome Biology, 21
  • [8] SVvalidation: A long-read-based validation method for genomic structural variation
    Zheng, Yan
    Shang, Xuequn
    PLOS ONE, 2024, 19 (01):
  • [9] Systematic benchmarking of tools for structural variation detection using short- and long-read sequencing data in pigs
    He, Sang
    Song, Bangmin
    Tang, Yueting
    Qu, Xiaolu
    Li, Xingzheng
    Yang, Xintong
    Bao, Qi
    Fang, Lingzhao
    Jiang, Jicai
    Tang, Zhonglin
    Yi, Guoqiang
    ISCIENCE, 2025, 28 (03)
  • [10] Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation
    Tao Jiang
    Shiqi Liu
    Shuqi Cao
    Yadong Liu
    Zhe Cui
    Yadong Wang
    Hongzhe Guo
    BMC Bioinformatics, 22