Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

被引:17
作者
Misra, Sanchit [1 ]
Agrawal, Ankit [1 ]
Liao, Wei-keng [1 ]
Choudhary, Alok [1 ]
机构
[1] Northwestern Univ, Dept Elect Engn & Comp Sci, Evanston, IL 60208 USA
基金
美国国家科学基金会;
关键词
ALIGNMENT; SEARCH; PROGRAM; TOOL;
D O I
10.1093/bioinformatics/btq648
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Recently, a number of programs have been proposed for mapping short reads to a reference genome. Many of them are heavily optimized for short-read mapping and hence are very efficient for shorter queries, but that makes them inefficient or not applicable for reads longer than 200 bp. However, many sequencers are already generating longer reads and more are expected to follow. For long read sequence mapping, there are limited options; BLAT, SSAHA2, FANGS and BWA-SW are among the popular ones. However, resequencing and personalized medicine need much faster software to map these long sequencing reads to a reference genome to identify SNPs or rare transcripts. Results: We present AGILE (AliGnIng Long rEads), a hash table based high-throughput sequence mapping algorithm for longer 454 reads that uses diagonal multiple seed-match criteria, customized q-gram filtering and a dynamic incremental search approach among other heuristics to optimize every step of the mapping process. In our experiments, we observe that AGILE is more accurate than BLAT, and comparable to BWA-SW and SSAHA2. For practical error rates (< 5%) and read lengths (200-1000 bp), AGILE is significantly faster than BLAT, SSAHA2 and BWA-SW. Even for the other cases, AGILE is comparable to BWA-SW and several times faster than BLAT and SSAHA2.
引用
收藏
页码:189 / 195
页数:7
相关论文
共 20 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   PASS: a program to align short sequences [J].
Campagna, Davide ;
Albiero, Alessandro ;
Bilardi, Alessandra ;
Caniato, Elisa ;
Forcato, Claudio ;
Manavski, Svetlin ;
Vitulo, Nicola ;
Valle, Giorgio .
BIOINFORMATICS, 2009, 25 (07) :967-968
[3]  
Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
[4]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[5]   Mapping short DNA sequencing reads and calling variants using mapping quality scores [J].
Li, Heng ;
Ruan, Jue ;
Durbin, Richard .
GENOME RESEARCH, 2008, 18 (11) :1851-1858
[6]   Fast and accurate long-read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2010, 26 (05) :589-595
[7]   SOAP: short oligonucleotide alignment program [J].
Li, Ruiqiang ;
Li, Yingrui ;
Kristiansen, Karsten ;
Wang, Jun .
BIOINFORMATICS, 2008, 24 (05) :713-714
[8]   Whole-Genome Sequencing in a Patient with Charcot-Marie-Tooth Neuropathy. [J].
Lupski, James R. ;
Reid, Jeffrey G. ;
Gonzaga-Jauregui, Claudia ;
Deiros, David Rio ;
Chen, David C. Y. ;
Nazareth, Lynne ;
Bainbridge, Matthew ;
Dinh, Huyen ;
Jing, Chyn ;
Wheeler, David A. ;
McGuire, Amy L. ;
Zhang, Feng ;
Stankiewicz, Pawel ;
Halperin, John J. ;
Yang, Chengyong ;
Gehman, Curtis ;
Guo, Danwei ;
Irikat, Rola K. ;
Tom, Warren ;
Fantin, Nick J. ;
Muzny, Donna M. ;
Gibbs, Richard A. .
NEW ENGLAND JOURNAL OF MEDICINE, 2010, 362 (13) :1181-1191
[9]  
MISRA S, 2009, P ACM S APPL COMP AC
[10]   A GENERAL METHOD APPLICABLE TO SEARCH FOR SIMILARITIES IN AMINO ACID SEQUENCE OF 2 PROTEINS [J].
NEEDLEMAN, SB ;
WUNSCH, CD .
JOURNAL OF MOLECULAR BIOLOGY, 1970, 48 (03) :443-+