Approximate Circular Pattern Matching Under Edit Distance

被引:2
作者
Charalampopoulos, Panagiotis [1 ]
Pissis, Solon P. [2 ,3 ]
Radoszewski, Jakub [4 ]
Rytter, Wojciech [4 ]
Walen, Tomasz [4 ]
Zuba, Wiktor [2 ]
机构
[1] Birkbeck Univ London, London, England
[2] CWI, Amsterdam, Netherlands
[3] Vrije Univ, Amsterdam, Netherlands
[4] Univ Warsaw, Warsaw, Poland
来源
41ST INTERNATIONAL SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE, STACS 2024 | 2024年 / 289卷
关键词
circular pattern matching; approximate pattern matching; edit distance; FASTER; TIME;
D O I
10.4230/LIPIcs.STACS.2024.24
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the k-Edit Circular Pattern Matching (k-Edit CPM) problem, we are given a length-n text T, a length-m pattern P, and a positive integer threshold k, and we are to report all starting positions of the substrings of T that are at edit distance at most k from some cyclic rotation of P. In the decision version of the problem, we are to check if any such substring exists. Very recently, Charalampopoulos et al. [ESA 2022] presented O(nk(2))-time and O(nk log(3) k) -time solutions for the reporting and decision versions of k-Edit CPM, respectively. Here, we show that the reporting and decision versions of k-Edit CPM can be solved in (n (n + m) k(6)) time and O(n+ (n + m) k(5) log(3) k) time, respectively, thus obtaining the first algorithms with a complexity of the type O(n + (n1m)poly(k)) for this problem. Notably, our algorithms run in O(n) time when m = Omega(k6) and are superior to the previous respective solutions when m = omega(k(4)). We provide a meta-algorithm that yields efficient algorithms in several other interesting settings, such as when the strings are given in a compressed form (as straight-line programs), when the strings are dynamic, or when we have a quantum computer. We obtain our solutions by exploiting the structure of approximate circular occurrences of P in T, when T is relatively short w.r.t. P. Roughly speaking, either the starting positions of approximate occurrences of rotations of P form Omega(k(4)) intervals that can be computed efficiently, or some rotation of P is almost periodic (is at a small edit distance from a string with small period). Dealing with the almost periodic case is the most technically demanding part of this work; we tackle it using properties of locked fragments (originating from [Cole and Hariharan, SICOMP 2002]).
引用
收藏
页数:22
相关论文
共 39 条
[1]  
Ambainis A, 2004, TR LOG STUD LOG LIB, V23, P15
[2]   Faster algorithms for string matching with k mismatches [J].
Amir, A ;
Lewenstein, M ;
Porat, E .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2004, 50 (02) :257-275
[3]   A faster and more accurate heuristic for cyclic edit distance computation [J].
Ayad, Lorraine A. K. ;
Barton, Carl ;
Pissis, Solon P. .
PATTERN RECOGNITION LETTERS, 2017, 88 :81-87
[4]   MARS: improving multiple circular sequence alignment using refined sequences [J].
Ayad, Lorraine A. K. ;
Pissis, Solon P. .
BMC GENOMICS, 2017, 18
[5]   EDIT DISTANCE CANNOT BE COMPUTED IN STRONGLY SUBQUADRATIC TIME (UNLESS SETH IS FALSE) [J].
Backurs, Arturs ;
Indyk, Piotr .
SIAM JOURNAL ON COMPUTING, 2018, 47 (03) :1087-1097
[6]   ELEMENTARY GATES FOR QUANTUM COMPUTATION [J].
BARENCO, A ;
BENNETT, CH ;
CLEVE, R ;
DIVINCENZO, DP ;
MARGOLUS, N ;
SHOR, P ;
SLEATOR, T ;
SMOLIN, JA ;
WEINFURTER, H .
PHYSICAL REVIEW A, 1995, 52 (05) :3457-3467
[7]   Fast algorithms for approximate circular string matching [J].
Barton, Carl ;
Iliopoulos, Costas S. ;
Pissis, Solon P. .
ALGORITHMS FOR MOLECULAR BIOLOGY, 2014, 9
[8]  
Bathie Gabriel, 2023, LIPICS, V283, DOI DOI 10.4230/LIPICS.ISAAC.2023.10
[9]  
Bringmann K, 2019, Disc Algorithms, P1126
[10]   Complexity measures and decision tree complexity: a survey [J].
Buhrman, H ;
de Wolf, R .
THEORETICAL COMPUTER SCIENCE, 2002, 288 (01) :21-43