SAMBLASTER: fast duplicate marking and structural variant read extraction

被引:574
作者
Faust, Gregory G. [1 ]
Hall, Ira M. [1 ,2 ]
机构
[1] Univ Virginia, Dept Biochem & Mol Genet, Charlottesville, VA 22908 USA
[2] Univ Virginia, Ctr Publ Hlth Genom, Charlottesville, VA 22908 USA
关键词
D O I
10.1093/bioinformatics/btu314
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped postpass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.
引用
收藏
页码:2503 / 2505
页数:3
相关论文
共 3 条
[1]   YAHA: fast and flexible long-read alignment with optimal breakpoint detection [J].
Faust, Gregory G. ;
Hall, Ira M. .
BIOINFORMATICS, 2012, 28 (19) :2417-2424
[2]   The Sequence Alignment/Map format and SAMtools [J].
Li, Heng ;
Handsaker, Bob ;
Wysoker, Alec ;
Fennell, Tim ;
Ruan, Jue ;
Homer, Nils ;
Marth, Gabor ;
Abecasis, Goncalo ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (16) :2078-2079
[3]   Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome [J].
Quinlan, Aaron R. ;
Clark, Royden A. ;
Sokolova, Svetlana ;
Leibowitz, Mitchell L. ;
Zhang, Yujun ;
Hurles, Matthew E. ;
Mell, Joshua C. ;
Hall, Ira M. .
GENOME RESEARCH, 2010, 20 (05) :623-635