Streaming Algorithms for Embedding and Computing Edit Distance in the Low Distance Regime

被引:39
作者
Chakraborty, Diptarka [1 ]
Goldenberg, Elazar [2 ]
Koucky, Michal [2 ]
机构
[1] Indian Inst Technol Kanpur, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India
[2] Charles Univ Prague, Inst Comp Sci, Malostranske Namesti 25, Prague 11800 1, Czech Republic
来源
STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING | 2016年
基金
欧洲研究理事会;
关键词
Edit distance; Hamming distance; randomized embedding; low distortion; string; kernel;
D O I
10.1145/2897518.2897577
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x, y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y. In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. We show a randomized embedding with quadratic distortion. Namely, for any x, y satisfying that their edit distance equals k, the Hamming distance between the embedding of x and y is O(k(2)) with high probability. This improves over the distortion ratio of O(log n log* n) obtained by Jowhari (2012) for small values of k. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input. We provide several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space O(s) and computing edit distance exactly up-to distance s(1/6). This algorithm is based on kernelization for edit distance that is of independent interest.
引用
收藏
页码:712 / 725
页数:14
相关论文
共 33 条
[1]  
Andoni A, 2003, SIAM PROC S, P523
[2]   Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity [J].
Andoni, Alexandr ;
Krauthgamer, Robert ;
Onak, Krzysztof .
2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, :377-386
[3]  
Andoni A, 2009, ACM S THEORY COMPUT, P199
[4]  
[Anonymous], 1997, Communication complexity
[5]  
[Anonymous], 2001, FDN CRYPTOGRAPHY
[6]   Color Fractal Descriptors for Adaxial Epidermis Texture Classification [J].
Backes, Andre R. ;
de Mesquita Sa Junior, Jarbas Joaci ;
Kolb, Rosana Marta .
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 :51-58
[7]   Approximating edit distance efficiently [J].
Bar-Yossef, Z ;
Jayram, TS ;
Krauthgamer, R ;
Kumar, R .
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, :550-559
[8]  
Batu T., 2003, ACM Comput. Surv, V35, P316, DOI [10.1145/780542.780590, DOI 10.1145/780542.780590]
[9]   Oblivious String Embeddings and Edit Distance Approximations [J].
Batu, Tugkan ;
Ergun, Funda ;
Sahinalp, Cenk .
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, :792-801
[10]  
Cormode G, 2002, SIAM PROC S, P667