Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype

被引:8399
作者
Kim, Daehwan [1 ]
Paggi, Joseph M. [2 ]
Park, Chanhee [1 ]
Bennett, Christopher [1 ]
Salzberg, Steven L. [3 ,4 ,5 ,6 ]
机构
[1] Univ Texas Southwestern Med Ctr Dallas, Lyda Hill Dept Bioinformat, Dallas, TX 75390 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[3] Johns Hopkins Univ, Sch Med, McKusick Nathans Inst Genet Med, Ctr Computat Biol, Baltimore, MD USA
[4] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[5] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[6] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21205 USA
关键词
READ ALIGNMENT; MUTATIONS; DATABASE;
D O I
10.1038/s41587-019-0201-4
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
The human reference genome represents only a small number of individuals, which limits its usefulness for genotyping. We present a method named HISAT2 (hierarchical indexing for spliced alignment of transcripts 2) that can align both DNA and RNA sequences using a graph Ferragina Manzini index. We use HISAT2 to represent and search an expanded model of the human reference genome in which over 14.5 million genomic variants in combination with haplotypes are incorporated into the data structure used for searching and alignment. We benchmark HISAT2 using simulated and real datasets to demonstrate that our strategy of representing a population of genomes, together with a fast, memory-efficient search algorithm, provides more detailed and accurate variant analyses than other methods. We apply HISAT2 for HLA typing and DNA fingerprinting; both applications form part of the HISAT-genotype software that enables analysis of haplotype-resolved genes or genomic regions. HISAT-genotype outperforms other computational methods and matches or exceeds the performance of laboratory-based assays.
引用
收藏
页码:907 / +
页数:10
相关论文
共 34 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2011, PREPRINT
[3]  
[Anonymous], 1994, 124 SRC DIG EQ CORP
[4]  
[Anonymous], 2012, NATURE, DOI DOI 10.1038/nature11632
[5]  
[Anonymous], 2013, GENOMICS
[6]   How to apply de Bruijn graphs to genome assembly [J].
Compeau, Phillip E. C. ;
Pevzner, Pavel A. ;
Tesler, Glenn .
NATURE BIOTECHNOLOGY, 2011, 29 (11) :987-991
[7]   A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree [J].
Eberle, Michael A. ;
Fritzilas, Epameinondas ;
Krusche, Peter ;
Kallberg, Morten ;
Moore, Benjamin L. ;
Bekritsky, Mitchell A. ;
Iqbal, Zamin ;
Chuang, Han-Yu ;
Humphray, Sean J. ;
Halpern, Aaron L. ;
Kruglyak, Semyon ;
Margulies, Elliott H. ;
McVean, Gil ;
Bentley, David R. .
GENOME RESEARCH, 2017, 27 (01) :157-164
[8]   Next-generation sequencing for HLA typing of class I loci [J].
Erlich, Rachel L. ;
Jia, Xiaoming ;
Anderson, Scott ;
Banks, Eric ;
Gao, Xiaojiang ;
Carrington, Mary ;
Gupta, Namrata ;
DePristo, Mark A. ;
Henn, Matthew R. ;
Lennon, Niall J. ;
de Bakker, Paul I. W. .
BMC GENOMICS, 2011, 12
[9]   Opportunistic data structures with applications [J].
Ferragina, P ;
Manzini, G .
41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, :390-398
[10]   Variation graph toolkit improves read mapping by representing genetic variation in the reference [J].
Garrison, Erik ;
Siren, Jouni ;
Novak, Adam M. ;
Hickey, Glenn ;
Eizenga, Jordan M. ;
Dawson, Eric T. ;
Jones, William ;
Garg, Shilpa ;
Markello, Charles ;
Lin, Michael F. ;
Paten, Benedict ;
Durbin, Richard .
NATURE BIOTECHNOLOGY, 2018, 36 (09) :875-+