Introducing difference recurrence relations for faster semi-global alignment of long sequences

被引:78
作者
Suzuki, Hajime [1 ]
Kasahara, Masahiro [1 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol & Med Sci, Kashiwa, Chiba, Japan
关键词
Sequence analysis; Alignment; Long read; BIT-VECTOR ALGORITHM; READ ALIGNMENT; SPEED-UP; PARALLEL; LIBRARY;
D O I
10.1186/s12859-018-2014-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. Results: We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Conclusions: Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages.
引用
收藏
页数:15
相关论文
共 38 条
[21]   Discovery and genotyping of structural variation from long-read haploid genome sequence data [J].
Huddleston, John ;
Chaisson, Mark J. P. ;
Steinberg, Karyn Meltz ;
Warren, Wes ;
Hoekzema, Kendra ;
Gordon, David ;
Graves-Lindsay, Tina A. ;
Munson, Katherine M. ;
Kronenberg, Zev N. ;
Vives, Laura ;
Peluso, Paul ;
Boitano, Matthew ;
Chin, Chen-Shin ;
Korlach, Jonas ;
Wilson, Richard K. ;
Eichler, Evan E. .
GENOME RESEARCH, 2017, 27 (05) :677-685
[22]   Adaptive seeds tame genomic sequence comparison [J].
Kielbasa, Szymon M. ;
Wan, Raymond ;
Sato, Kengo ;
Horton, Paul ;
Frith, Martin C. .
GENOME RESEARCH, 2011, 21 (03) :487-493
[23]   A BIT-PARALLEL DYNAMIC PROGRAMMING ALGORITHM SUITABLE FOR DNA SEQUENCE ALIGNMENT [J].
Kimura, Kouichi ;
Koike, Asako ;
Nakai, Kenta .
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2012, 10 (04)
[24]   Hybrid error correction and de novo assembly of single-molecule sequencing reads [J].
Koren, Sergey ;
Schatz, Michael C. ;
Walenz, Brian P. ;
Martin, Jeffrey ;
Howard, Jason T. ;
Ganapathy, Ganeshkumar ;
Wang, Zhong ;
Rasko, David A. ;
McCombie, W. Richard ;
Jarvis, Erich D. ;
Phillippy, Adam M. .
NATURE BIOTECHNOLOGY, 2012, 30 (07) :692-+
[25]  
Langmead B, 2012, NAT METHODS, V9, P357, DOI [10.1038/NMETH.1923, 10.1038/nmeth.1923]
[26]  
Li H., 2017, ARXIV E PRINTS
[27]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760
[28]   BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm [J].
Loving, Joshua ;
Hernandez, Yozen ;
Benson, Gary .
BIOINFORMATICS, 2014, 30 (22) :3166-3173
[29]   A fast bit-vector algorithm for approximate string matching based on dynamic programming [J].
Myers, G .
JOURNAL OF THE ACM, 1999, 46 (03) :395-415
[30]  
Myers G, 2014, LECT N BIOINFORMAT, V8701, P52, DOI 10.1007/978-3-662-44753-6_5