SVsearcher: A more accurate structural variation detection method in long read data

被引:3
|
作者
Zheng, Yan [1 ]
Shang, Xuequn [1 ]
Sung, Wing-Kin [2 ,3 ,4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, West Youyi Rd 127, Xian 710072, Peoples R China
[2] Chinese Univ Hong Kong, Dept Chem Pathol, Hong Kong, Peoples R China
[3] Hong Kong Genome Inst, Shatin, Hong Kong Sci Pk, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Li Ka Shing Inst Hlth Sci, Lab Computat Genom, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Long-read sequencing data; Structural variations; SV detection; PAIRED-END; VARIANTS; IMPACT; INDELS; CANCER;
D O I
10.1016/j.compbiomed.2023.106843
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50x) datasets and more than 25% for low coverage (10x) datasets. More importantly, SVsearcher can identify 81.7%-91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)-54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [41] FindCSV: a long-read based method for detecting complex structural variations
    Zheng, Yan
    Shang, Xuequn
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [42] Discovery and genotyping of structural variation from long-read haploid genome sequence data (vol 27, pg 677, 2017)
    Huddleston, John
    Chaisson, Mark J. P.
    Steinberg, Karyn Meltz
    Warren, Wes
    Hoekzema, Kendra
    Gordon, David
    Graves-Lindsay, Tina A.
    Munson, Katherine M.
    Kronenberg, Zev N.
    Vives, Laura
    Peluso, Paul
    Boitano, Matthew
    Chin, Chen-Shin
    Korlach, Jonas
    Wilson, Richard K.
    Eichler, Evan E.
    GENOME RESEARCH, 2018, 28 (01) : 144 - 144
  • [43] A recurrence based approach for validating structural variation using long-read sequencing technology
    Zhao, Xuefang
    Weber, Alexandra M.
    Mills, Ryan E.
    GIGASCIENCE, 2017, 6 (08):
  • [44] Initial Analysis of Structural Variation Detections in Cattle Using Long-Read Sequencing Methods
    Gao, Yahui
    Ma, Li
    Liu, George E.
    GENES, 2022, 13 (05)
  • [45] An Efficient and Accurate Mixed Dynamic Data Race Detection Method
    Sun, Jiaze
    Yang, Yanman
    Shu, Xinfeng
    ACM International Conference Proceeding Series, 2021, : 531 - 535
  • [46] A Structural Novelty Detection Method in Incomplete Data
    Jiang, Shao-Fei
    Han, Zhe-Dong
    Fu, Da-Bao
    Wu, Zhao-Qi
    DYNAMICS FOR SUSTAINABLE ENGINEERING, VOL 1, 2011, : 232 - 241
  • [47] ProcaryaSV: structural variation detection pipeline for bacterial genomes using short-read sequencing
    Jugas, Robin
    Vitkova, Helena
    BMC BIOINFORMATICS, 2024, 25 (01):
  • [48] Advancing long-read nanopore genome assembly and accurate variant calling for rare disease detection
    Negi, Shloka
    Stenton, Sarah L.
    Berger, Seth I.
    Canigiula, Paolo
    Mcnulty, Brandy
    Violich, Ivo
    Gardner, Joshua
    Hillaker, Todd
    O'Rourke, Sara M.
    O'Leary, Melanie C.
    Carbonell, Elizabeth
    Austin-Tse, Christina
    Lemire, Gabrielle
    Serrano, Jillian
    Mangilog, Brian
    Vannoy, Grace
    Kolmogorov, Mikhail
    Vilain, Eric
    O'Donnell-Luria, Anne
    Delot, Emmanuele
    Miga, Karen H.
    Monlong, Jean
    Paten, Benedict
    AMERICAN JOURNAL OF HUMAN GENETICS, 2025, 112 (02)
  • [49] Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
    Wenger, Aaron M.
    Peluso, Paul
    Rowell, William J.
    Chang, Pi-Chuan
    Hall, Richard J.
    Concepcion, Gregory T.
    Ebler, Jana
    Fungtammasan, Arkarachai
    Kolesnikov, Alexey
    Olson, Nathan D.
    Topfer, Armin
    Alonge, Michael
    Mahmoud, Medhat
    Qian, Yufeng
    Chin, Chen-Shan
    Phillippy, Adam M.
    Schate, Michael C.
    Myers, Gene
    DePristo, Mark A.
    Ruan, Jue
    Marschall, Tobias
    Sedlazeck, Fritz J.
    Zook, Justin M.
    Li, Heng
    Koren, Sergey
    Carroll, Andrew
    Rank, David R.
    Hunkapiller, Michael W.
    NATURE BIOTECHNOLOGY, 2019, 37 (10) : 1155 - +
  • [50] Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
    Aaron M. Wenger
    Paul Peluso
    William J. Rowell
    Pi-Chuan Chang
    Richard J. Hall
    Gregory T. Concepcion
    Jana Ebler
    Arkarachai Fungtammasan
    Alexey Kolesnikov
    Nathan D. Olson
    Armin Töpfer
    Michael Alonge
    Medhat Mahmoud
    Yufeng Qian
    Chen-Shan Chin
    Adam M. Phillippy
    Michael C. Schatz
    Gene Myers
    Mark A. DePristo
    Jue Ruan
    Tobias Marschall
    Fritz J. Sedlazeck
    Justin M. Zook
    Heng Li
    Sergey Koren
    Andrew Carroll
    David R. Rank
    Michael W. Hunkapiller
    Nature Biotechnology, 2019, 37 : 1155 - 1162