A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar

被引:107
作者
Garrison, Erik [1 ]
Kronenberg, Zev N. N. [2 ]
Dawson, Eric T. T. [3 ]
Pedersen, Brent S. S. [4 ]
Prins, Pjotr [1 ]
机构
[1] Univ Tennessee, Dept Genet Genom & Informat, Hlth Sci Ctr, Memphis, TN 38163 USA
[2] Pacific Biosci, San Diego, CA USA
[3] NVIDIA Corp, Santa Clara, CA USA
[4] Univ Med Ctr, Ctr Mol Med, Utrecht, Netherlands
关键词
FRAMEWORK;
D O I
10.1371/journal.pcbi.1009123
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Author summaryMost bioinformatics workflows deal with DNA/RNA variations that are typically represented in the variant call format (VCF)-a file format that describes mutations (SNP and MNP), insertions and deletions (INDEL) against a reference genome. Here we present a wide range of free and open source software tools that are used in biomedical sequencing workflows around the world today. Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies-as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome.Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple vcflib, bio-vcf, cyvcf2, hts-nim and slivar projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices.We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.
引用
收藏
页数:15
相关论文
共 48 条
[1]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[2]  
Amstutz P, 2016, DOI [10.6084/m9.figshare.3115156.v2, 10.6084/M9.FIGSHARE.3115156.V2, 10.6084/m9.figshare.3115156.v2]
[3]  
[Anonymous], 2014, VGTOOLS WORKING GENO
[4]  
[Anonymous], 2021, BIOVCF SMART VCF PAR
[5]  
[Anonymous], 1993, DEBIAN LINUX SOFTWAR
[6]  
[Anonymous], 2015, GRAPHICAL FRAGMENT A
[7]  
[Anonymous], 2021, VCFLIB WORKING VCF F
[8]  
[Anonymous], 2017, BIORXIV
[9]  
[Anonymous], 2020, PGGB PANGENOME GRAPH
[10]  
[Anonymous], 2011, HTS SPECS SPECIFICAT