Improved approximate string matching using compressed suffix data structures

被引:0
作者
Lam, TW [1 ]
Sung, WK
Wong, SS
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore
来源
ALGORITHMS AND COMPUTATION | 2005年 / 3827卷
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Approximate string matching is about finding a given string pattern in a text by allowing some degree of errors, In this paper we present a space efficient data structure to solve the 1-mismatch and 1-difference problems. Given a text T of length n over a fixed alphabet A, we can preprocess T and give an O(n root log n)-bit space data structure so that, for any query pattern P of length m, we can find all 1-mismatch (or 1-difference) occurrences of P in O(m log log n +occ) time, where occ is the number of occurrences. This is the fastest known query time given that the space of the data structure is o(n log(2) n) bits. The space of our data structure can be further reduced to O(n) if we can afford a slow down factor of log(epsilon) n, for 0 < epsilon <= 1. Furthermore, our solution can be generalized to solve the k-mismatch (and the k-difference) problem in O(\A\(k)m(k) (k + log log n) + occ) and O(log(epsilon) n(\A\(k)m(k)(k + log log n) + occ)) query time using an O(n root log n)-bit and an O(n)-bit indexing data structures, respectively.
引用
收藏
页码:339 / 348
页数:10
相关论文
共 14 条
[1]   Text indexing and dictionary matching with one error [J].
Amir, A ;
Keselman, D ;
Landau, GM ;
Lewenstein, M ;
Lewenstein, N ;
Rodeh, M .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2000, 37 (02) :309-325
[2]  
[Anonymous], J DISCRETE ALGORITHM
[3]  
BUCHSBAUM AL, 2000, P EUR S ALG, P120
[4]  
Cole Richard, 2004, P 36 ANN ACM S THEOR, P91, DOI DOI 10.1145/1007352.1007374
[5]  
Grossi R., 2000, Proceedings of the Thirty Second Annual ACM Symposium on Theory of Computing, P397, DOI 10.1145/335305.335351
[6]  
JOKINEN P, 1991, P 16 INT S MATH FDN, P240
[7]   Space efficient suffix trees [J].
Munro, JL ;
Raman, V ;
Rao, SS .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2001, 39 (02) :205-222
[8]  
NAVARRO G, 2000, P 11 ANN S COMB PATT, P350
[9]   Time-space trade-offs for compressed suffix arrays [J].
Rao, SS .
INFORMATION PROCESSING LETTERS, 2002, 82 (06) :307-311
[10]  
SADAKANE K, IN PRESS THEORY COMP