Approximating edit distance efficiently

被引:78
作者
Bar-Yossef, Z [1 ]
Jayram, TS [1 ]
Krauthgamer, R [1 ]
Kumar, R [1 ]
机构
[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel
来源
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS | 2004年
关键词
D O I
10.1109/FOCS.2004.14
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than l, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn)(2/3) gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. l gap problem, where e can be as small as O(k(2))-still with a constant sketch-but works only for strings that are mildly "non-repetitive". Finally, we develop an n(3/7)-approximation quasi-linear time algorithm for edit distance, improving the previous best factor of n(3/4) [5]; if the input strings are assumed to be non-repetitive, then the approximation factor can be strengthened to n(1/3).
引用
收藏
页码:550 / 559
页数:10
相关论文
共 25 条
  • [1] Andoni A, 2003, SIAM PROC S, P523
  • [2] [Anonymous], CYBERNETICS CONTROL
  • [3] [Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
  • [4] BABAI L, 1995, P 12 STACS, V900, P361
  • [5] Information theory methods in communication complexity
    Bar-Yossef, Z
    Jayram, TS
    Kumar, R
    Sivakumar, D
    [J]. 17TH ANNUAL IEEE CONFERENCE ON COMPUTATIONAL COMPLEXITY, PROCEEDINGS, 2002, : 93 - 102
  • [6] Batu T., 2003, 35 ANN ACM SIGACT S, P316, DOI [10.1145/780542.780590, DOI 10.1145/780542.780590, 10.1145/780542.780590.2, DOI 10.1145/780542.780590.2]
  • [7] Approximate string matching: A simpler faster algorithm
    Cole, R
    Hariharan, R
    [J]. SIAM JOURNAL ON COMPUTING, 2002, 31 (06) : 1761 - 1782
  • [8] Cormode G, 2002, SIAM PROC S, P667
  • [9] Cormode G, 2000, PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P197
  • [10] Feigenbaum J, 2001, LECT NOTES COMPUT SC, V2076, P927