A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

被引:3
|
作者
Chen, Kuan-Yu [1 ]
Chao, Kun-Mao [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
Compressed pattern matching; Edit distance; Run length;
D O I
10.1007/s00453-011-9592-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first "fully compressed" algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, ma parts per thousand currency signn, we present an O(mn (2))-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn (2)) time.
引用
收藏
页码:354 / 370
页数:17
相关论文
共 14 条
  • [1] A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings
    Kuan-Yu Chen
    Kun-Mao Chao
    Algorithmica, 2013, 65 : 354 - 370
  • [2] Edit distance of run-length encoded strings
    Arbell, O
    Landau, GM
    Mitchell, JSB
    INFORMATION PROCESSING LETTERS, 2002, 83 (06) : 307 - 314
  • [3] A fast and simple algorithm for computing the longest common subsequence of run-length encoded strings
    Ann, Hsing-Yen
    Yang, Chang-Biau
    Tseng, Chiou-Ting
    Hor, Chiou-Yi
    INFORMATION PROCESSING LETTERS, 2008, 108 (06) : 360 - 364
  • [4] Computing the Longest Common Subsequence of Two Run-Length Encoded Strings
    Sakai, Yoshifumi
    ALGORITHMS AND COMPUTATION, ISAAC 2012, 2012, 7676 : 197 - 206
  • [5] Modified Run-Length Encoding Method and Distance Algorithm to Classify Run-Length Encoded Binary Data
    Kathirvalavalakumar, T.
    Palaniappan, R.
    CONTROL, COMPUTATION AND INFORMATION SYSTEMS, 2011, 140 : 271 - 280
  • [6] Approximate matching of run-length compressed strings
    Mäkinen, V
    Navarro, G
    Ukkonen, E
    ALGORITHMICA, 2003, 35 (04) : 347 - 369
  • [7] Hardness of comparing two run-length encoded strings
    Chen, Kuan-Yu
    Hsu, Ping-Hui
    Chao, Kun-Mao
    JOURNAL OF COMPLEXITY, 2010, 26 (04) : 364 - 374
  • [8] Edit distance for a run-length-encoded string and an uncompressed string
    Liu, J. J.
    Huang, G. S.
    Wang, Y. L.
    Lee, R. C. T.
    INFORMATION PROCESSING LETTERS, 2007, 105 (01) : 12 - 16
  • [9] Approximate Matching for Run-Length Encoded Strings Is 3SUM-Hard
    Chen, Kuan-Yu
    Hsu, Ping-Hui
    Chao, Kun-Mao
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2009, 5577 : 168 - 179
  • [10] Classification of run-length encoded binary data
    Babu, T. Ravindra
    Murty, M. Narasimha
    Agrawal, V. K.
    PATTERN RECOGNITION, 2007, 40 (01) : 321 - 323