nanotatoR: a tool for enhanced annotation of genomic structural variants

被引:8
作者
Bhattacharya, Surajit [1 ]
Barseghyan, Hayk [1 ,2 ,3 ]
Delot, Emmanuele C. [1 ,2 ]
Vilain, Eric [1 ,2 ]
机构
[1] Childrens Natl Hosp, Ctr Genet Med Res, Childrens Res Inst, Washington, DC 20010 USA
[2] George Washington Univ, Dept Genom & Precis Med, Sch Med & Hlth Sci, Washington, DC 20052 USA
[3] Bionano Genom Inc, San Diego, CA 92121 USA
关键词
Optical genome mapping; Annotation; Structural variants; COMPREHENSIVE EVALUATION; SEQUENCING DATA; PHENOTYPE; DATABASE; PACKAGE; DISEASE; GENE;
D O I
10.1186/s12864-020-07182-w
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Whole genome sequencing is effective at identification of small variants, but because it is based on short reads, assessment of structural variants (SVs) is limited. The advent of Optical Genome Mapping (OGM), which utilizes long fluorescently labeled DNA molecules for de novo genome assembly and SV calling, has allowed for increased sensitivity and specificity in SV detection. However, compared to small variant annotation tools, OGM-based SV annotation software has seen little development, and currently available SV annotation tools do not provide sufficient information for determination of variant pathogenicity. Results: We developed an R-based package, nanotatoR, which provides comprehensive annotation as a tool for SV classification. nanotatoR uses both external (DGV; DECIPHER; Bionano Genomics BNDB) and internal (user-defined) databases to estimate SV frequency. Human genome reference GRCh37/38-based BED files are used to annotate SVs with overlapping, upstream, and downstream genes. Overlap percentages and distances for nearest genes are calculated and can be used for filtration. A primary gene list is extracted from public databases based on the patient's phenotype and used to filter genes overlapping SVs, providing the analyst with an easy way to prioritize variants. If available, expression of overlapping or nearby genes of interest is extracted (e.g. from an RNA-Seq dataset, allowing the user to assess the effects of SVs on the transcriptome). Most quality-control filtration parameters are customizable by the user. The output is given in an Excel file format, subdivided into multiple sheets based on SV type and inheritance pattern (INDELs, inversions, translocations, de novo, etc.). nanotatoR passed all quality and run time criteria of Bioconductor, where it was accepted in the April 2019 release. We evaluated nanotatoR's annotation capabilities using publicly available reference datasets: the singleton sample NA12878, mapped with two types of enzyme labeling, and the NA24143 trio. nanotatoR was also able to accurately filter the known pathogenic variants in a cohort of patients with Duchenne Muscular Dystrophy for which we had previously demonstrated the diagnostic ability of OGM. Conclusions: The extensive annotation enables users to rapidly identify potential pathogenic SVs, a critical step toward use of OGM in the clinical setting.
引用
收藏
页数:16
相关论文
共 49 条
[1]   Integrated small copy number variations and epigenome maps of disorders of sex development [J].
Amarillo I.E. ;
Nievera I. ;
Hagan A. ;
Huchthagowder V. ;
Heeley J. ;
Hollander A. ;
Koenig J. ;
Austin P. ;
Wang T. .
Human Genome Variation, 3 (1)
[2]  
[Anonymous], 2018, BIONANO GENOMICS BIO
[3]   Next-generation mapping: a novel approach for detection of pathogenic structural variants with a potential utility in clinical diagnosis [J].
Barseghyan, Hayk ;
Tang, Wilson ;
Wang, Richard T. ;
Almalvez, Miguel ;
Segura, Eva ;
Bramble, Matthew S. ;
Lipson, Allen ;
Douine, Emilie D. ;
Lee, Hane ;
Delot, Emmanuele C. ;
Nelson, Stanley F. ;
Vilain, Eric .
GENOME MEDICINE, 2017, 9
[4]   Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants [J].
Belkadi, Aziz ;
Bolze, Alexandre ;
Itan, Yuval ;
Cobat, Aurelie ;
Vincent, Quentin B. ;
Antipenko, Alexander ;
Shang, Lei ;
Boisson, Bertrand ;
Casanova, Jean-Laurent ;
Abel, Laurent .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (17) :5473-5478
[5]  
Bionano Genomics, 2019, SMAP FILE FORMAT SPE
[6]   Third generation sequencing: technology and its potential impact on evolutionary biodiversity research [J].
Bleidorn, Christoph .
SYSTEMATICS AND BIODIVERSITY, 2016, 14 (01) :1-8
[7]   Bionano Genome Mapping: High-Throughput, Ultra-Long Molecule Genome Analysis System for Precision Genome Assembly and Haploid-Resolved Structural Variation Discovery [J].
Bocklandt, Sven ;
Hastie, Alex ;
Cao, Han .
SINGLE MOLECULE AND SINGLE CELL SEQUENCING, 2019, 1129 :97-118
[8]   Paternally inherited cis-regulatory structural variants are associated with autism [J].
Brandler, William M. ;
Antaki, Danny ;
Gujral, Madhusudan ;
Kleiber, Morgan L. ;
Whitney, Joe ;
Maile, Michelle S. ;
Hong, Oanh ;
Chapman, Timothy R. ;
Tan, Shirley ;
Tandon, Prateek ;
Pang, Timothy ;
Tang, Shih C. ;
Vaux, Keith K. ;
Yang, Yan ;
Harrington, Eoghan ;
Juul, Sissel ;
Turner, Daniel J. ;
Thiruvahindrapuram, Bhooma ;
Kaur, Gaganjot ;
Wang, Zhuozhi ;
Kingsmore, Stephen F. ;
Gleeson, Joseph G. ;
Bisson, Denis ;
Kakaradov, Boyko ;
Telenti, Amalio ;
Venter, J. Craig ;
Corominas, Roser ;
Toma, Claudio ;
Cormand, Bru ;
Rueda, Isabel ;
Guijarro, Silvina ;
Messer, Karen S. ;
Nievergelt, Caroline M. ;
Arranz, Maria J. ;
Courchesne, Eric ;
Pierce, Karen ;
Muotri, Alysson R. ;
Iakoucheva, Lilia M. ;
Hervas, Amaia ;
Scherer, Stephen W. ;
Corsello, Christina ;
Sebat, Jonathan .
SCIENCE, 2018, 360 (6386) :327-330
[9]   Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software [J].
Cameron, Daniel L. ;
Di Stefano, Leon ;
Papenfuss, Anthony T. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[10]  
Cao Z., 2017, J. Proteonomics Bioinf., V10, DOI [10.4172/JPB.1000455, DOI 10.4172/JPB.1000455]