Linear work suffix array construction

被引:222
作者
Karkkainen, Juha
Sanders, Peter
Burkhardt, Stefan
机构
[1] Univ Helsinki, Dept Comp Sci, FI-00014 Helsinki, Finland
[2] Univ Karlsruhe, D-76128 Karlsruhe, Germany
[3] Google Inc, CH-8002 Zurich, Switzerland
关键词
algorithms; theory; difference cover; external memory algorithms; suffix array;
D O I
10.1145/1217856.1217858
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple linear-time construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover. This view leads to a generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space-time tradeoff. For any v. [1, root n], it runs in O(vn) time using O(n/root v) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREW-PRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.
引用
收藏
页码:918 / 936
页数:19
相关论文
共 60 条
  • [1] Abouelhoda M. I., 2002, String Processing and Information Retrieval. 9th International Symposium, SPIRE 2002. Proceedings (Lecture Notes in Computer Science Vol.2476), P31
  • [2] Abouelhoda M. I., 2004, Journal of Discrete Algorithms, V2, P53, DOI 10.1016/S1570-8667(03)00065-0
  • [3] Suffix trees on words
    Andersson, A
    Larsson, NJ
    Swanson, K
    [J]. ALGORITHMICA, 1999, 23 (03) : 246 - 260
  • [4] [Anonymous], P 40 ANN S FDN COMP
  • [5] [Anonymous], P ACM S PAR ALG ARCH
  • [6] [Anonymous], J DISCRETE ALGORITHM
  • [7] [Anonymous], P 5 ANN ACM S PAR AL
  • [8] [Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
  • [9] Bertoni A., 2003, International Journal of Foundations of Computer Science, V14, P871, DOI 10.1142/S0129054103002060
  • [10] Burkhardt S, 2003, LECT NOTES COMPUT SC, V2676, P55