Efficient algorithms for (δ, γ, α) and (δ, kΔ, α)-matching

被引:4
作者
Fredriksson, Kimmo [1 ]
Grabowski, Szymon [2 ]
机构
[1] Univ Joensuu, Dept Comp Sci & Stat, FIN-80101 Joensuu, Finland
[2] Tech Univ Lodz, Dept Comp Engn, PL-90924 Lodz, Poland
关键词
approximate string matching; music information retrieval; computational biology; bit-parallelism; sparse dynamic programming; bounded gaps;
D O I
10.1142/S0129054108005607
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose new algorithms for (delta, gamma, alpha)-matching. In this string matching problem we axe given a pattern P = p(0)p(1)...p(m-1) and a text T = t(0)t(1)...t(n-1) over some integer alphabet Sigma = {0...sigma - 1}. The pattern symbol p(i) delta-matches the text symbol t(j) iff vertical bar p(i) - t(j)vertical bar <= delta. The pattern P (delta, gamma)-matches some text substring t(j)... t(j+m-1) iff for all i it holds that vertical bar p(i) - t(j+1)vertical bar <= delta and Sigma vertical bar p(i) - t(j+i)vertical bar <= gamma. Finally, in (delta, gamma, alpha)-matching we also permit at most alpha-symbol gaps between each matching text symbol. The only known previous algorithm runs in O(nm) time. We give several algorithms that improve the average case up to O(n) for small alpha, and the worst case to O(min{nm, vertical bar M vertical bar alpha}) or O(nm log(gamma)/w), where M = {(i, j) vertical bar vertical bar p(i) - t(j)vertical bar <= delta} and w is the number of bits in a machine word. The proposed algorithms can be easily modified to solve several other related problems, we explicitly consider e.g. character classes (instead of delta-matching), (Delta-limited) k-mismatches (instead of gamma-matching) and more general gaps, including negative ones. These find important applications in computational biology. We conclude with experimental results showing that the algorithms are very efficient in practice.
引用
收藏
页码:163 / 183
页数:21
相关论文
共 50 条
[31]   Space-efficient computation of parallel approximate string matching [J].
Sadiq, Muhammad Umair ;
Yousaf, Muhammad Murtaza .
JOURNAL OF SUPERCOMPUTING, 2023, 79 (08) :9093-9126
[32]   An efficient algorithm for δ-approximate matching with α-bounded gaps in musical sequences [J].
Cantone, D ;
Cristofaro, S ;
Faro, S .
EXPERIMENTAL AND EFFICIENT ALGORITHMS, PROCEEDINGS, 2005, 3503 :428-439
[33]   Efficient Many-To-Many Point Matching in One Dimension [J].
Justin Colannino ;
Mirela Damian ;
Ferran Hurtado ;
Stefan Langerman ;
Henk Meijer ;
Suneeta Ramaswami ;
Diane Souvaine ;
Godfried Toussaint .
Graphs and Combinatorics, 2007, 23 :169-178
[34]   Efficient Implementations of the Approximate String Matching on the Memory Machine Models [J].
Nakano, Koji .
2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, :233-239
[35]   Efficient Hardware Implementation of Snapshotting Algorithms for NoC Applications [J].
Zene, Andrei ;
Chirap, Claudiu-Teodor ;
Cret, Octavian ;
Vacariu, Lucia .
ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2015, 18 (01) :79-92
[36]   THAAD: Efficient matching queries under temporal abstraction for anomaly detection [J].
Mateless, Roni ;
Segal, Michael ;
Moskovitch, Robert .
PERFORMANCE EVALUATION, 2021, 149
[37]   SeqMatcher: efficient genome sequence matching with AVX-512 extensions [J].
Espinosa, Elena ;
Quislant, Ricardo ;
Larrosa, Rafael ;
Plata, Oscar .
JOURNAL OF SUPERCOMPUTING, 2025, 81 (01)
[38]   Efficient string-matching allowing for non-overlapping inversions [J].
Cantone, Domenico ;
Cristofaro, Salvatore ;
Faro, Simone .
THEORETICAL COMPUTER SCIENCE, 2013, 483 :85-95
[39]   Efficient automatic exact motif discovery algorithms for biological sequences [J].
Karci, Ali .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7952-7963
[40]   Space and time efficient parallel algorithms and software for EST clustering [J].
Kalyanaraman, A ;
Aluru, S ;
Brendel, V ;
Kothari, S .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2003, 14 (12) :1209-1221