A Fully Compressed Algorithm for Computing the Edit Distance of Run-Length Encoded Strings

被引:3
作者
Chen, Kuan-Yu [1 ]
Chao, Kun-Mao [1 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
Compressed pattern matching; Edit distance; Run length;
D O I
10.1007/s00453-011-9592-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first "fully compressed" algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, ma parts per thousand currency signn, we present an O(mn (2))-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn (2)) time.
引用
收藏
页码:354 / 370
页数:17
相关论文
共 24 条
[1]  
AGGARWAL A, 1988, FOCS, P497
[2]  
Amir A., 1992, DCC '92. Data Compression Conference (Cat. No.92TH0436-6), P279, DOI 10.1109/DCC.1992.227453
[3]   Inplace run-length 2d compressed search [J].
Amir, A ;
Landau, GM ;
Sokol, D .
THEORETICAL COMPUTER SCIENCE, 2003, 290 (03) :1361-1383
[4]   Let sleeping files lie: Pattern matching in Z-compressed files [J].
Amir, A ;
Benson, G ;
Farach, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1996, 52 (02) :299-307
[5]   Matching for run-length encoded strings [J].
Apostolico, A ;
Landau, GM ;
Skiena, S .
JOURNAL OF COMPLEXITY, 1999, 15 (01) :4-16
[6]   Edit distance of run-length encoded strings [J].
Arbell, O ;
Landau, GM ;
Mitchell, JSB .
INFORMATION PROCESSING LETTERS, 2002, 83 (06) :307-314
[7]   AN IMPROVED ALGORITHM FOR COMPUTING THE EDIT DISTANCE OF RUN-LENGTH CODED STRINGS [J].
BUNKE, H ;
CSIRIK, J .
INFORMATION PROCESSING LETTERS, 1995, 54 (02) :93-96
[8]   Hardness of comparing two run-length encoded strings [J].
Chen, Kuan-Yu ;
Hsu, Ping-Hui ;
Chao, Kun-Mao .
JOURNAL OF COMPLEXITY, 2010, 26 (04) :364-374
[9]   A subquadratic sequence alignment algorithm for unrestricted scoring matrices [J].
Crochemore, M ;
Landau, GM ;
Ziv-Ukelson, M .
SIAM JOURNAL ON COMPUTING, 2003, 32 (06) :1654-1673
[10]   DEQUES WITH HEAP ORDER. [J].
Gajewska, Hania ;
Tarjan, Robert E. .
Information Processing Letters, 1986, 22 (04) :197-200