SVIM: structural variant identification using mapped long reads

被引:177
作者
Heller, David [1 ]
Vingron, Martin [1 ]
机构
[1] Max Planck Inst Mol Genet, Dept Computat Mol Biol, D-14195 Berlin, Germany
关键词
HUMAN GENOME; SIMULATION;
D O I
10.1093/bioinformatics/btz041
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Structural variants are defined as genomic variants larger than 50 bp. They have been shown to affect more bases in any given genome than single-nucleotide polymorphisms or small insertions and deletions. Additionally, they have great impact on human phenotype and diversity and have been linked to numerous diseases. Due to their size and association with repeats, they are difficult to detect by shotgun sequencing, especially when based on short reads. Long read, single-molecule sequencing technologies like those offered by Pacific Biosciences or Oxford Nanopore Technologies produce reads with a length of several thousand base pairs. Despite the higher error rate and sequencing cost, long-read sequencing offers many advantages for the detection of structural variants. Yet, available software tools still do not fully exploit the possibilities. Results: We present SVIM, a tool for the sensitive detection and precise characterization of structural variants from long-read data. SVIM consists of three components for the collection, clustering and combination of structural variant signatures from read alignments. It discriminates five different variant classes including similar types, such as tandem and interspersed duplications and novel element insertions. SVIM is unique in its capability of extracting both the genomic origin and destination of duplications. It compares favorably with existing tools in evaluations on simulated data and real datasets from Pacific Biosciences and Nanopore sequencing machines.
引用
收藏
页码:2907 / 2915
页数:9
相关论文
共 38 条
  • [1] APPLICATIONS OF NEXT-GENERATION SEQUENCING Genome structural variation discovery and genotyping
    Alkan, Can
    Coe, Bradley P.
    Eichler, Evan E.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (05) : 363 - 375
  • [2] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    [J]. NATURE, 2015, 526 (7571) : 68 - +
  • [3] RSVSim: an R/Bioconductor package for the simulation of structural variations
    Bartenhagen, Christoph
    Dugas, Martin
    [J]. BIOINFORMATICS, 2013, 29 (13) : 1679 - 1681
  • [4] FINDING ALL CLIQUES OF AN UNDIRECTED GRAPH [H]
    BRON, C
    KERBOSCH, J
    [J]. COMMUNICATIONS OF THE ACM, 1973, 16 (09) : 575 - 577
  • [5] Mechanisms underlying structural variant formation in genomic disorders
    Carvalho, Claudia M. B.
    Lupski, James R.
    [J]. NATURE REVIEWS GENETICS, 2016, 17 (04) : 224 - 238
  • [6] Chaisson M.J. P., 2017, Multi-platform discovery of haplotype-resolved structural variation in human genomes, DOI DOI 10.1101/193144
  • [7] Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory
    Chaisson, Mark J.
    Tesler, Glenn
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [8] Resolving the complexity of the human genome using single-molecule sequencing
    Chaisson, Mark J. P.
    Huddleston, John
    Dennis, Megan Y.
    Sudmant, Peter H.
    Malig, Maika
    Hormozdiari, Fereydoun
    Antonacci, Francesca
    Surti, Urvashi
    Sandstrom, Richard
    Boitano, Matthew
    Landolin, Jane M.
    Stamatoyannopoulos, John A.
    Hunkapiller, Michael W.
    Korlach, Jonas
    Eichler, Evan E.
    [J]. NATURE, 2015, 517 (7536) : 608 - U163
  • [9] Pybedtools: a flexible Python']Python library for manipulating genomic datasets and annotations
    Dale, Ryan K.
    Pedersen, Brent S.
    Quinlan, Aaron R.
    [J]. BIOINFORMATICS, 2011, 27 (24) : 3423 - 3424
  • [10] Assessing structural variation in a personal genome-towards a human reference diploid genome
    English, Adam C.
    Salerno, William J.
    Hampton, Oliver A.
    Gonzaga-Jauregui, Claudia
    Ambreth, Shruthi
    Ritter, Deborah I.
    Beck, Christine R.
    Davis, Caleb F.
    Dahdouli, Mahmoud
    Ma, Singer
    Carroll, Andrew
    Veeraraghavan, Narayanan
    Bruestle, Jeremy
    Drees, Becky
    Hastie, Alex
    Lam, Ernest T.
    White, Simon
    Mishra, Pamela
    Wang, Min
    Han, Yi
    Zhang, Feng
    Stankiewicz, Pawel
    Wheeler, David A.
    Reid, Jeffrey G.
    Muzny, Donna M.
    Rogers, Jeffrey
    Sabo, Aniko
    Worley, Kim C.
    Lupski, James R.
    Boerwinkle, Eric
    Gibbs, Richard A.
    [J]. BMC GENOMICS, 2015, 16