SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data

被引:7
|
作者
Zheng, Yan [1 ]
Shang, Xuequn [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, West Youyi Rd 127, Xian 710072, Peoples R China
基金
中国国家自然科学基金;
关键词
Long-read sequencing data; Structural variations; SV caller; Deep learning; PAIRED-END; IMPACT; VARIANTS; INDELS; CANCER;
D O I
10.1186/s12859-023-05324-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural variations (SVs) refer to variations in an organism's chromosome structure that exceed a length of 50 base pairs. They play a significant role in genetic diseases and evolutionary mechanisms. While long-read sequencing technology has led to the development of numerous SV caller methods, their performance results have been suboptimal. Researchers have observed that current SV callers often miss true SVs and generate many false SVs, especially in repetitive regions and areas with multi-allelic SVs. These errors are due to the messy alignments of long-read data, which are affected by their high error rate. Therefore, there is a need for a more accurate SV caller method. Result: We propose a new method-SVcnn, a more accurate deep learning-based method for detecting SVs by using long-read sequencing data. We run SVcnn and other SV callers in three real datasets and find that SVcnn improves the F1-score by 2-8% compared with the second-best method when the read depth is greater than 5x. More importantly, SVcnn has better performance for detecting multi-allelic SVs. Conclusions: SVcnn is an accurate deep learning-based method to detect SVs. The program is available at https://github.com/nwpuzhengyan/SVcnn.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] SVcnn: an accurate deep learning-based method for detecting structural variation based on long-read data
    Yan Zheng
    Xuequn Shang
    BMC Bioinformatics, 24
  • [2] FindCSV: a long-read based method for detecting complex structural variations
    Zheng, Yan
    Shang, Xuequn
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [3] Machine Learning-Based Artifact Detection for Long-Read Sequencing Data
    Mbuga, Felix
    Lam, Kathy
    Lee, Wendy
    2023 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE, CSCI 2023, 2023, : 582 - 584
  • [4] Long-read sequencing - for detecting clinically relevant structural variation
    Hoischen, A.
    Wenger, A. M.
    van der Vorst, M.
    Kwint, M.
    Nelen, M.
    Neveling, K.
    Baybayan, P.
    Hickey, L.
    Kuijpers, J.
    Korlach, J.
    Corcoran, K.
    Brunner, H. G.
    Vissers, L. E. L. M.
    Gilissen, C.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 849 - 849
  • [5] Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
    Zheng, Zhenxian
    Li, Shumin
    Su, Junhao
    Leung, Amy Wing-Sze
    Lam, Tak-Wah
    Luo, Ruibang
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (12): : 797 - +
  • [6] Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
    Zhenxian Zheng
    Shumin Li
    Junhao Su
    Amy Wing-Sze Leung
    Tak-Wah Lam
    Ruibang Luo
    Nature Computational Science, 2022, 2 : 797 - 803
  • [7] SVsearcher: A more accurate structural variation detection method in long read data
    Zheng, Yan
    Shang, Xuequn
    Sung, Wing-Kin
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 158 : 1 - 10
  • [8] Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation
    Jiang, Tao
    Liu, Shiqi
    Cao, Shuqi
    Liu, Yadong
    Cui, Zhe
    Wang, Yadong
    Guo, Hongzhe
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [9] Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation
    Tao Jiang
    Shiqi Liu
    Shuqi Cao
    Yadong Liu
    Zhe Cui
    Yadong Wang
    Hongzhe Guo
    BMC Bioinformatics, 22
  • [10] A recurrence based approach for validating structural variation using long-read sequencing technology
    Zhao, Xuefang
    Weber, Alexandra M.
    Mills, Ryan E.
    GIGASCIENCE, 2017, 6 (08):