High Throughput Short Read Alignment via Bi-directional BWT

被引:78
作者
Lam, T. W. [1 ]
Li, Ruiqiang [2 ]
Tam, Alan [1 ]
Wong, Simon [1 ]
Wu, Edward [1 ]
Yiu, S. M. [1 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[2] Beijing Genom Inst Shenzhen, Shenzhen 518083, Peoples R China
来源
2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE | 2009年
关键词
short read alignment; BWT; indexing data structure; HUMAN GENOME; DNA;
D O I
10.1109/BIBM.2009.42
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The advancement of sequencing technologies has made it feasible for researchers to consider many high-throughput biological applications. A core step of these applications is to align an enormous amount of short reads to a reference genome. For example, to resequence a human genome, billions of reads of 35 bp are produced in 1-2 weeks, putting a lot of pressure of faster software for alignment. Based on existing indexing and pattern matching technologies, several short read alignment software have been developed recently. Yet this is still strong need to further improve the speed. In this paper, we show a new indexing data structure called bi-directional BWT, which allows us to build the fastest software for aligning short reads. When compared with existing software (Bowtie is the best), our software is at least 3 times faster for finding unique best alignments, and 25 times faster for finding all possible alignments. We believe that bi-directional BWT is an interesting data structure on its own and could be applied to other pattern matching problems.
引用
收藏
页码:31 / +
页数:2
相关论文
共 18 条
[1]  
[Anonymous], 2009, BIOINFORMATICS
[2]  
[Anonymous], 1997, ACM SIGACT NEWS
[3]   Accurate whole human genome sequencing using reversible terminator chemistry [J].
Bentley, David R. ;
Balasubramanian, Shankar ;
Swerdlow, Harold P. ;
Smith, Geoffrey P. ;
Milton, John ;
Brown, Clive G. ;
Hall, Kevin P. ;
Evers, Dirk J. ;
Barnes, Colin L. ;
Bignell, Helen R. ;
Boutell, Jonathan M. ;
Bryant, Jason ;
Carter, Richard J. ;
Cheetham, R. Keira ;
Cox, Anthony J. ;
Ellis, Darren J. ;
Flatbush, Michael R. ;
Gormley, Niall A. ;
Humphray, Sean J. ;
Irving, Leslie J. ;
Karbelashvili, Mirian S. ;
Kirk, Scott M. ;
Li, Heng ;
Liu, Xiaohai ;
Maisinger, Klaus S. ;
Murray, Lisa J. ;
Obradovic, Bojan ;
Ost, Tobias ;
Parkinson, Michael L. ;
Pratt, Mark R. ;
Rasolonjatovo, Isabelle M. J. ;
Reed, Mark T. ;
Rigatti, Roberto ;
Rodighiero, Chiara ;
Ross, Mark T. ;
Sabot, Andrea ;
Sankar, Subramanian V. ;
Scally, Aylwyn ;
Schroth, Gary P. ;
Smith, Mark E. ;
Smith, Vincent P. ;
Spiridou, Anastassia ;
Torrance, Peta E. ;
Tzonev, Svilen S. ;
Vermaas, Eric H. ;
Walter, Klaudia ;
Wu, Xiaolin ;
Zhang, Lu ;
Alam, Mohammed D. ;
Anastasi, Carole .
NATURE, 2008, 456 (7218) :53-59
[4]   Whole-genome re-sequencing [J].
Bentley, David R. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 2006, 16 (06) :545-552
[5]  
Burrow M., 1994, 124 DIG EQ CORP
[6]   Whole-genome sequencing and variant discovery in C-elegans [J].
Hillier, LaDeana W. ;
Marth, Gabor T. ;
Quinlan, Aaron R. ;
Dooling, David ;
Fewell, Ginger ;
Barnett, Derek ;
Fox, Paul ;
Glasscock, Jarret I. ;
Hickenbotham, Matthew ;
Huang, Weichun ;
Magrini, Vincent J. ;
Richt, Ryan J. ;
Sander, Sacha N. ;
Stewart, Donald A. ;
Stromberg, Michael ;
Tsung, Eric F. ;
Wylie, Todd ;
Schedl, Tim ;
Wilson, Richard K. ;
Mardis, Elaine R. .
NATURE METHODS, 2008, 5 (02) :183-188
[7]  
JARVIE T., 2008, NATURE METHODS, V5
[8]   Genome-wide mapping of in vivo protein-DNA interactions [J].
Johnson, David S. ;
Mortazavi, Ali ;
Myers, Richard M. ;
Wold, Barbara .
SCIENCE, 2007, 316 (5830) :1497-1502
[9]  
Kao M.-Y., 2008, Encyclopedia of algorithms
[10]   Compressed indexing and local alignment of DNA [J].
Lam, T. W. ;
Sung, W. K. ;
Tam, S. L. ;
Wong, C. K. ;
Yiu, S. M. .
BIOINFORMATICS, 2008, 24 (06) :791-797