Twelve years of SAMtools and BCFtools

被引:6763
作者
Danecek, Petr [1 ]
Bonfield, James K. [1 ]
Liddle, Jennifer [1 ]
Marshall, John [2 ]
Ohan, Valeriu [1 ]
Pollard, Martin O. [1 ]
Whitwham, Andrew [1 ]
Keane, Thomas [3 ]
McCarthy, Shane A. [1 ]
Davies, Robert M. [1 ]
Li, Heng [4 ,5 ]
机构
[1] Wellcome Sanger Inst, Wellcome Genome Campus, Hinxton CB10 1SA, Cambs, England
[2] Univ Glasgow, Wolfson Wohl Canc Res Ctr, Inst Canc Sci, Switchback Rd, Glasgow G61 1QH, Lanark, Scotland
[3] EMBL EBI, Wellcome Genome Campus, Hinxton CB10 1SD, Cambs, England
[4] Dana Farber Canc Inst, Dept Data Sci, 450 Brookline Ave, Boston, MA 02215 USA
[5] Harvard Med Sch, Dept Biomed Informat, 10 Shattuck St, Boston, MA 02215 USA
基金
英国惠康基金;
关键词
samtools; bcftools; high-throughput sequencing; next generation sequencing; variant calling; data analysis; DISCOVERY; FRAMEWORK; ALIGNMENT;
D O I
10.1093/gigascience/giab008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background: SAMtools and BCFtools are widely used programs for processing and analysing high-throughput sequencing data. They include tools for file format conversion and manipulation, sorting, querying, statistics, variant calling, and effect analysis amongst other methods. Findings: The first version appeared online 12 years ago and has been maintained and further developed ever since, with many new features and improvements added over the years. The SAMtools and BCFtools packages represent a unique collection of tools that have been used in numerous other software projects and countless genomic pipelines. Conclusion: Both SAMtools and BCFtools are freely available on GitHub under the permissive MIT licence, free for both non-commercial and commercial use. Both packages have been installed >1 million times via Bioconda. The source code and documentation are available from https://www.htslib.org.
引用
收藏
页数:4
相关论文
共 29 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]  
[Anonymous], SAMTOOLS WORKFLOW
[3]  
[Anonymous], BCFTOOLS
[4]  
[Anonymous], SAMTOOLS DOCUMENTATI
[5]  
Bioinformatics B, 2011, FastQC: a quality control tool for high throughput sequence data
[6]   HTSlib: C library for reading/writing high-throughput sequencing data [J].
Bonfield, James K. ;
Marshall, John ;
Danecek, Petr ;
Li, Heng ;
Ohan, Valeriu ;
Whitwham, Andrew ;
Keane, Thomas ;
Davies, Robert M. .
GIGASCIENCE, 2021, 10 (02)
[7]   Crumble: reference free lossy compression of sequence quality values [J].
Bonfield, James K. ;
McCarthy, Shane A. ;
Durbin, Richard .
BIOINFORMATICS, 2019, 35 (02) :337-339
[8]   The variant call format and VCFtools [J].
Danecek, Petr ;
Auton, Adam ;
Abecasis, Goncalo ;
Albers, Cornelis A. ;
Banks, Eric ;
DePristo, Mark A. ;
Handsaker, Robert E. ;
Lunter, Gerton ;
Marth, Gabor T. ;
Sherry, Stephen T. ;
McVean, Gilean ;
Durbin, Richard .
BIOINFORMATICS, 2011, 27 (15) :2156-2158
[9]   BCFtools/csq: haplotype-aware variant consequences [J].
Danecek, Petr ;
McCarthy, Shane A. .
BIOINFORMATICS, 2017, 33 (13) :2037-2039
[10]   A Method for Checking Genomic Integrity in Cultured Cell Lines from SNP Genotyping Data [J].
Danecek, Petr ;
McCarthy, Shane A. ;
Durbin, Richard .
PLOS ONE, 2016, 11 (05)