共 3 条
SAMBLASTER: fast duplicate marking and structural variant read extraction
被引:574
作者:
Faust, Gregory G.
[1
]
Hall, Ira M.
[1
,2
]
机构:
[1] Univ Virginia, Dept Biochem & Mol Genet, Charlottesville, VA 22908 USA
[2] Univ Virginia, Ctr Publ Hlth Genom, Charlottesville, VA 22908 USA
关键词:
D O I:
10.1093/bioinformatics/btu314
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the requirement to read, write, sort and compress large BAM files multiple times. Results: We present SAMBLASTER, a tool that reduces the number of times such costly operations are performed. SAMBLASTER is designed to mark duplicates in read-sorted SAM files as a piped postpass on DNA aligner output before it is compressed to BAM. In addition, it can simultaneously output into separate files the discordant read-pairs and/or split-read mappings used for structural variant calling. As an alignment post-pass, its own runtime overhead is negligible, while dramatically reducing overall pipeline complexity and runtime. As a stand-alone duplicate marking tool, it performs significantly better than PICARD or SAMBAMBA in terms of both speed and memory usage, while achieving nearly identical results.
引用
收藏
页码:2503 / 2505
页数:3
相关论文