Streaming Algorithms for Embedding and Computing Edit Distance in the Low Distance Regime

被引:37
作者
Chakraborty, Diptarka [1 ]
Goldenberg, Elazar [2 ]
Koucky, Michal [2 ]
机构
[1] Indian Inst Technol Kanpur, Dept Comp Sci & Engn, Kanpur, Uttar Pradesh, India
[2] Charles Univ Prague, Inst Comp Sci, Malostranske Namesti 25, Prague 11800 1, Czech Republic
来源
STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING | 2016年
基金
欧洲研究理事会;
关键词
Edit distance; Hamming distance; randomized embedding; low distortion; string; kernel;
D O I
10.1145/2897518.2897577
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Hamming and the edit metrics are two common notions of measuring distances between pairs of strings x, y lying in the Boolean hypercube. The edit distance between x and y is defined as the minimum number of character insertion, deletion, and bit flips needed for converting x into y. Whereas, the Hamming distance between x and y is the number of bit flips needed for converting x to y. In this paper we study a randomized injective embedding of the edit distance into the Hamming distance with a small distortion. We show a randomized embedding with quadratic distortion. Namely, for any x, y satisfying that their edit distance equals k, the Hamming distance between the embedding of x and y is O(k(2)) with high probability. This improves over the distortion ratio of O(log n log* n) obtained by Jowhari (2012) for small values of k. Moreover, the embedding output size is linear in the input size and the embedding can be computed using a single pass over the input. We provide several applications for this embedding. Among our results we provide a one-pass (streaming) algorithm for edit distance running in space O(s) and computing edit distance exactly up-to distance s(1/6). This algorithm is based on kernelization for edit distance that is of independent interest.
引用
收藏
页码:712 / 725
页数:14
相关论文
共 33 条
  • [1] Andoni A, 2003, SIAM PROC S, P523
  • [2] Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity
    Andoni, Alexandr
    Krauthgamer, Robert
    Onak, Krzysztof
    [J]. 2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, : 377 - 386
  • [3] Andoni A, 2009, ACM S THEORY COMPUT, P199
  • [4] [Anonymous], 1997, Communication complexity
  • [5] [Anonymous], 2001, FDN CRYPTOGRAPHY
  • [6] Color Fractal Descriptors for Adaxial Epidermis Texture Classification
    Backes, Andre R.
    de Mesquita Sa Junior, Jarbas Joaci
    Kolb, Rosana Marta
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 51 - 58
  • [7] Approximating edit distance efficiently
    Bar-Yossef, Z
    Jayram, TS
    Krauthgamer, R
    Kumar, R
    [J]. 45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, : 550 - 559
  • [8] Batu T., 2003, ACM Comput. Surv, V35, P316, DOI [10.1145/780542.780590, DOI 10.1145/780542.780590]
  • [9] Oblivious String Embeddings and Edit Distance Approximations
    Batu, Tugkan
    Ergun, Funda
    Sahinalp, Cenk
    [J]. PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 792 - 801
  • [10] Cormode G, 2002, SIAM PROC S, P667