GraphAligner: rapid and versatile sequence-to-graph alignment

被引:95
作者
Rautiainen, Mikko [1 ,2 ,3 ]
Marschall, Tobias [4 ]
机构
[1] Saarland Univ, Ctr Bioinformat, Saarland Informat Campus E2-1, D-66123 Saarbrucken, Germany
[2] Max Planck Inst Informat, Saarland Informat Campus E1-4, D-66123 Saarbrucken, Germany
[3] Saarbrucken Grad Sch Comp Sci, Saarland Informat Campus E1-3, D-66123 Saarbrucken, Germany
[4] Heinrich Heine Univ Dusseldorf, Fac Med, Inst Med Biometry & Bioinformat, Moorenstr 5, D-40225 Dusseldorf, Germany
关键词
Genome graphs; Sequence alignment; Pangenome; Error correction; Long reads; READ ALIGNMENT; ACCURATE; ALGORITHM; SEARCH; SEED;
D O I
10.1186/s13059-020-02157-2
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Genome graphscan represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools. Availability: Package manager: https://anaconda.org/bioconda/graphaligner and source code: https://github.com/maickrau/GraphAligner
引用
收藏
页数:28
相关论文
共 67 条
[1]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[2]   HYBRIDSPADES: an algorithm for hybrid assembly of short and long reads [J].
Antipov, Dmitry ;
Korobeynikov, Anton ;
McLean, Jeffrey S. ;
Pevzner, Pavel A. .
BIOINFORMATICS, 2016, 32 (07) :1009-1015
[3]   SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing [J].
Bankevich, Anton ;
Nurk, Sergey ;
Antipov, Dmitry ;
Gurevich, Alexey A. ;
Dvorkin, Mikhail ;
Kulikov, Alexander S. ;
Lesin, Valery M. ;
Nikolenko, Sergey I. ;
Son Pham ;
Prjibelski, Andrey D. ;
Pyshkin, Alexey V. ;
Sirotkin, Alexander V. ;
Vyahhi, Nikolay ;
Tesler, Glenn ;
Alekseyev, Max A. ;
Pevzner, Pavel A. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (05) :455-477
[4]   Multi-platform discovery of haplotype-resolved structural variation in human genomes [J].
Chaisson, Mark J. P. ;
Sanders, Ashley D. ;
Zhao, Xuefang ;
Malhotra, Ankit ;
Porubsky, David ;
Rausch, Tobias ;
Gardner, Eugene J. ;
Rodriguez, Oscar L. ;
Guo, Li ;
Collins, Ryan L. ;
Fan, Xian ;
Wen, Jia ;
Handsaker, Robert E. ;
Fairley, Susan ;
Kronenberg, Zev N. ;
Kong, Xiangmeng ;
Hormozdiari, Fereydoun ;
Lee, Dillon ;
Wenger, Aaron M. ;
Hastie, Alex R. ;
Antaki, Danny ;
Anantharaman, Thomas ;
Audano, Peter A. ;
Brand, Harrison ;
Cantsilieris, Stuart ;
Cao, Han ;
Cerveira, Eliza ;
Chen, Chong ;
Chen, Xintong ;
Chin, Chen-Shan ;
Chong, Zechen ;
Chuang, Nelson T. ;
Lambert, Christine C. ;
Church, Deanna M. ;
Clarke, Laura ;
Farrell, Andrew ;
Flores, Joey ;
Galeev, Timur ;
Gorkin, David U. ;
Gujral, Madhusudan ;
Guryev, Victor ;
Heaton, William Haynes ;
Korlach, Jonas ;
Kumar, Sushant ;
Kwon, Jee Young ;
Lam, Ernest T. ;
Lee, Jong Eun ;
Lee, Joyce ;
Lee, Wan-Ping ;
Lee, Sau Peng .
NATURE COMMUNICATIONS, 2019, 10 (1)
[5]  
CHAO KM, 1992, COMPUT APPL BIOSCI, V8, P481
[6]   Compacting de Bruijn graphs from sequencing data quickly and in low memory [J].
Chikhi, Rayan ;
Limasset, Antoine ;
Medvedev, Paul .
BIOINFORMATICS, 2016, 32 (12) :201-208
[7]   ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions [J].
Dolzhenko, Egor ;
Deshpande, Viraj ;
Schlesinger, Felix ;
Krusche, Peter ;
Petrovski, Roman ;
Chen, Sai ;
Emig-Agius, Dorothea ;
Gross, Andrew ;
Narzisi, Giuseppe ;
Bowman, Brett ;
Scheffler, Konrad ;
van Vugt, Joke J. F. A. ;
French, Courtney ;
Sanchis-Juan, Alba ;
Ibanez, Kristina ;
Tucci, Arianna ;
Lajoie, Bryan R. ;
Veldink, Jan H. ;
Raymond, F. Lucy ;
Taft, Ryan J. ;
Bentley, David R. ;
Eberle, Michael A. .
BIOINFORMATICS, 2019, 35 (22) :4754-4756
[8]   Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT) [J].
Durbin, Richard .
BIOINFORMATICS, 2014, 30 (09) :1266-1272
[9]  
Edmonds J, 2003, COMBINATORIAL OPTIMI
[10]  
Equi M, 2019, ARXIV190203560