ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

被引:10497
|
作者
Wang, Kai [1 ]
Li, Mingyao [2 ]
Hakonarson, Hakon [1 ,3 ]
机构
[1] Childrens Hosp Philadelphia, Ctr Appl Genom, Philadelphia, PA 19104 USA
[2] Univ Penn, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pediat, Philadelphia, PA 19104 USA
关键词
SNPS; ASSOCIATION; GENOMES;
D O I
10.1093/nar/gkq603
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires similar to 4 min to perform gene-based annotation and similar to 15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing
    Calabrese, Claudia
    Simone, Domenico
    Diroma, Maria Angela
    Santorsola, Mariangela
    Gutta, Cristiano
    Gasparre, Giuseppe
    Picardi, Ernesto
    Pesole, Graziano
    Attimonelli, Marcella
    BIOINFORMATICS, 2014, 30 (21) : 3115 - 3117
  • [2] VNTRseek--a computational tool to detect tandem repeat variants in high-throughput sequencing data
    Gelfand, Yevgeniy
    Hernandez, Yozen
    Loving, Joshua
    Benson, Gary
    NUCLEIC ACIDS RESEARCH, 2014, 42 (14) : 8884 - 8894
  • [3] ReSeq simulates realistic Illumina high-throughput sequencing data
    Schmeing, Stephan
    Robinson, Mark D.
    GENOME BIOLOGY, 2021, 22 (01)
  • [4] Prevention, diagnosis and treatment of high-throughput sequencing data pathologies
    Zhou, Xiaofan
    Rokas, Antonis
    MOLECULAR ECOLOGY, 2014, 23 (07) : 1679 - 1700
  • [5] Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data
    Qiao, Dandi
    Yip, Wai-Ki
    Lange, Christoph
    BMC BIOINFORMATICS, 2012, 13
  • [6] Delineating the genetic heterogeneity of ALS using targeted high-throughput sequencing
    Kenna, Kevin P.
    McLaughlin, Russell L.
    Byrne, Susan
    Elamin, Marwa
    Heverin, Mark
    Kenny, Elaine M.
    Cormican, Paul
    Morris, Derek W.
    Donaghy, Colette G.
    Bradley, Daniel G.
    Hardiman, Orla
    JOURNAL OF MEDICAL GENETICS, 2013, 50 (11) : 776 - 783
  • [7] High-throughput sequencing is revealing genetic associations with avian plumage color
    Funk, Erik R.
    Taylor, Scott A.
    AUK, 2019, 136 (04):
  • [8] Methods for the detection and assembly of novel sequence in high-throughput sequencing data
    Holtgrewe, Manuel
    Kuchenbecker, Leon
    Reinert, Knut
    BIOINFORMATICS, 2015, 31 (12) : 1904 - 1912
  • [9] A Primer on the Analysis of High-Throughput Sequencing Data for Detection of Plant Viruses
    Kutnjak, Denis
    Tamisier, Lucie
    Adams, Ian
    Boonham, Neil
    Candresse, Thierry
    Chiumenti, Michela
    De Jonghe, Kris
    Kreuze, Jan F.
    Lefebvre, Marie
    Silva, Goncalo
    Malapi-Wight, Martha
    Margaria, Paolo
    Plesko, Irena Mavriric
    McGreig, Sam
    Miozzi, Laura
    Remenant, Benoit
    Reynard, Jean-Sebastien
    Rollin, Johan
    Rott, Mike
    Schumpp, Olivier
    Massart, Sebastien
    Haegeman, Annelies
    MICROORGANISMS, 2021, 9 (04)
  • [10] Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data
    Bilton, Timothy P.
    McEwan, John C.
    Clarke, Shannon M.
    Brauning, Rudiger
    van Stijn, Tracey C.
    Rowe, Suzanne J.
    Dodds, Ken G.
    GENETICS, 2018, 209 (02) : 389 - 400