Approximating edit distance efficiently

被引:78
作者
Bar-Yossef, Z [1 ]
Jayram, TS [1 ]
Krauthgamer, R [1 ]
Kumar, R [1 ]
机构
[1] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel
来源
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS | 2004年
关键词
D O I
10.1109/FOCS.2004.14
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Edit distance has been extensively studied for the past several years. Nevertheless, no linear-time algorithm is known to compute the edit distance between two strings, or even to approximate it to within a modest factor Furthermore, for various natural algorithmic problems such as low-distortion embeddings into normed spaces, approximate nearest-neighbor schemes, and sketching algorithms, known results for the edit distance are rather weak. We develop algorithms that solve gap versions of the edit distance problem: given two strings of length n with the promise that their edit distance is either at most k or greater than l, decide which of the two holds. We present two sketching algorithms for gap versions of edit distance. Our first algorithm solves the k vs. (kn)(2/3) gap problem, using a constant size sketch. A more involved algorithm solves the stronger k vs. l gap problem, where e can be as small as O(k(2))-still with a constant sketch-but works only for strings that are mildly "non-repetitive". Finally, we develop an n(3/7)-approximation quasi-linear time algorithm for edit distance, improving the previous best factor of n(3/4) [5]; if the input strings are assumed to be non-repetitive, then the approximation factor can be strengthened to n(1/3).
引用
收藏
页码:550 / 559
页数:10
相关论文
共 25 条
[1]  
Andoni A, 2003, SIAM PROC S, P523
[2]  
[Anonymous], CYBERNETICS CONTROL
[3]  
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[4]  
BABAI L, 1995, P 12 STACS, V900, P361
[5]   Information theory methods in communication complexity [J].
Bar-Yossef, Z ;
Jayram, TS ;
Kumar, R ;
Sivakumar, D .
17TH ANNUAL IEEE CONFERENCE ON COMPUTATIONAL COMPLEXITY, PROCEEDINGS, 2002, :93-102
[6]  
Batu T., 2003, 35 ANN ACM SIGACT S, P316, DOI [10.1145/780542.780590, DOI 10.1145/780542.780590, 10.1145/780542.780590.2, DOI 10.1145/780542.780590.2]
[7]   Approximate string matching: A simpler faster algorithm [J].
Cole, R ;
Hariharan, R .
SIAM JOURNAL ON COMPUTING, 2002, 31 (06) :1761-1782
[8]  
Cormode G, 2002, SIAM PROC S, P667
[9]  
Cormode G, 2000, PROCEEDINGS OF THE ELEVENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P197
[10]  
Feigenbaum J, 2001, LECT NOTES COMPUT SC, V2076, P927