Alfred: interactive multi-sample BAM alignment statistics, feature counting and feature annotation for long- and short-read sequencing

被引:52
作者
Rausch, Tobias [1 ,2 ]
Fritz, Markus Hsi-Yang [2 ]
Korbel, Jan O. [2 ]
Benes, Vladimir [1 ]
机构
[1] EMBL, Genom Core Facil, Meyerhofstr 1, D-69117 Heidelberg, Germany
[2] EMBL, Genome Biol Unit, Meyerhofstr 1, D-69117 Heidelberg, Germany
关键词
QUALITY-CONTROL;
D O I
10.1093/bioinformatics/bty1007
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Harmonizing quality control (QC) of large-scale second and third-generation sequencing datasets is key for enabling downstream computational and biological analyses. We present Alfred, an efficient and versatile command-line application that computes multi-sample QC metrics in a read-group aware manner, across a wide variety of sequencing assays and technologies. In addition to standard QC metrics such as GC bias, base composition, insert size and sequencing coverage distributions it supports haplotype-aware and allele-specific feature counting and feature annotation. The versatility of Alfred allows for easy pipeline integration in high-throughput settings, including DNA sequencing facilities and large-scale research initiatives, enabling continuous monitoring of sequence data quality and characteristics across samples. Alfred supports haplo-tagging of BAM/CRAM files to conduct haplotype-resolved analyses in conjunction with a variety of next-generation sequencing based assays. Alfred's companion web application enables interactive exploration of results and comparison to public datasets. Availability and implementation Alfred is open-source and freely available at https://tobiasrausch.com/alfred/. Supplementary information Supplementary data are available at Bioinformatics online.
引用
收藏
页码:2489 / 2491
页数:3
相关论文
共 11 条
[1]  
[Anonymous], BOOST C LIB
[2]   RNA-SeQC: RNA-seq metrics for quality control and process optimization [J].
DeLuca, David S. ;
Levin, Joshua Z. ;
Sivachenko, Andrey ;
Fennell, Timothy ;
Nazaire, Marc-Danie ;
Williams, Chris ;
Reich, Michael ;
Winckler, Wendy ;
Getz, Gad .
BIOINFORMATICS, 2012, 28 (11) :1530-1532
[3]   CHANCE: comprehensive software for quality control and validation of ChIP-seq data [J].
Diaz, Aaron ;
Nellore, Abhinav ;
Song, Jun S. .
GENOME BIOLOGY, 2012, 13 (10) :R98
[4]   Standardization and quality management in next-generation sequencing [J].
Endrullat, Christoph ;
Gloekler, Joern ;
Franke, Philipp ;
Frohme, Marcus .
APPLIED AND TRANSLATIONAL GENOMICS, 2016, 10 :2-9
[5]   Efficient storage of high throughput DNA sequencing data using reference-based compression [J].
Fritz, Markus Hsi-Yang ;
Leinonen, Rasko ;
Cochrane, Guy ;
Birney, Ewan .
GENOME RESEARCH, 2011, 21 (05) :734-740
[6]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[7]   Poretools: a toolkit for analyzing nanopore sequence data [J].
Loman, Nicholas J. ;
Quinlan, Aaron R. .
BIOINFORMATICS, 2014, 30 (23) :3399-3401
[8]   Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data [J].
Okonechnikov, Konstantin ;
Conesa, Ana ;
Garcia-Alcalde, Fernando .
BIOINFORMATICS, 2016, 32 (02) :292-294
[9]   NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data [J].
Patel, Ravi K. ;
Jain, Mukesh .
PLOS ONE, 2012, 7 (02)
[10]   Dense and accurate whole-chromosome haplotyping of individual genomes [J].
Porubsky, David ;
Garg, Shilpa ;
Sanders, Ashley D. ;
Korbel, Jan O. ;
Guryev, Victor ;
Lansdorp, Peter M. ;
Marschall, Tobias .
NATURE COMMUNICATIONS, 2017, 8