Efficient Algorithms for Substring Near Neighbor Problem

被引：17

作者：

Andoni, Alexandr

Indyk, Piotr

机构：

来源：

PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS | 2006年

关键词：

D O I：

10.1145/1109557.1109690

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper we consider the problem of finding the approximate nearest neighbor when the data set points are the substrings of a given text T. Specifically, for a string T of length n, we present a data structure which does the following: given a pattern P, if there is a substring of T within the distance R from P, it reports a (possibly different) substring of T within distance cR from P. The length of the pattern P, denoted by m, is not known in advance. For the case where the distances are measured using the Hamming distance, we present a data structure which uses (O) over tilde (n(1+1/c)) space(1) and with (O) over tilde (n(l/c) + mn(o(1))) query time. This essentially matches the earlier bounds of [Ind98], which assumed that the pattern length m is fixed in advance. In addition, our data structure can be constructed in time (O) over tilde (n(1+1/c) + n(1+o(1)) M-1/3), where M is an upper bound for in. This essentially matches the preprocessing bound of [Ind98] as long as the term <(O)over tilde> (n(1+1/c)) dominates the running time, which is the case when, e.g., c < 3. We also extend our results to the case where the distances are measured according to the l(I) distance. The query time and the space bound are essentially the same, while the preprocessing time becomes <(O)over tilde>(n(1+1/c) + n(1+o(1)) M-2/3).

引用

页码：1203 / 1212

页数：10

共 22 条

[1] EFFICIENT 2-DIMENSIONAL APPROXIMATE MATCHING OF HALF-RECTANGULAR FIGURES [J].