A randomized algorithm for approximate string matching

被引:28
|
作者
Atallah, MJ [1 ]
Chyzak, F
Dumas, P
机构
[1] Purdue Univ, CERIAS, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France
关键词
convolution; FFT; approximate string matching; randomized algorithms;
D O I
10.1007/s004530010062
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We give a randomized algorithm in deterministic time O(N log M) for estimating the score vector of matches between a text string of length N and a pattern string of length M, i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm uses convolution to find an estimator of the scores; the variance of the estimator is particularly small for scores that are close to M, i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, "never match" and "always match" symbols, to the weighted case and to higher dimensions.
引用
收藏
页码:468 / 486
页数:19
相关论文
共 50 条
  • [41] Approximate string matching: A lightweight approach to recognize gestures with Kinect
    Ibanez, Rodrigo
    Soria, Alvaro
    Teyseyre, Alfredo
    Rodriguez, Guillermo
    Campo, Marcelo
    PATTERN RECOGNITION, 2017, 62 : 73 - 86
  • [42] Approximate String Matching for Detecting Keywords in Scanned Business Documents
    Ha, Thi Hien
    RASLAN 2019: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2019, : 49 - 54
  • [43] Parallel Algorithms for Approximate String Matching with k Mismatches on CUDA
    Liu, Yu
    Guo, Longjiang
    Li, Jinbao
    Ren, Meirui
    Li, Keqin
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2414 - 2421
  • [44] Approximate Boyer-Moore String Matching for Small Alphabets
    Salmela, Leena
    Tarhio, Jorma
    Kalsi, Petri
    ALGORITHMICA, 2010, 58 (03) : 591 - 609
  • [45] Content based video retrieval based on approximate string matching
    Masihi, ZG
    Charkari, NM
    EUROCON 2005: THE INTERNATIONAL CONFERENCE ON COMPUTER AS A TOOL, VOL 1 AND 2 , PROCEEDINGS, 2005, : 1304 - 1307
  • [46] Space-efficient computation of parallel approximate string matching
    Sadiq, Muhammad Umair
    Yousaf, Muhammad Murtaza
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (08): : 9093 - 9126
  • [47] Approximate Boyer-Moore String Matching for Small Alphabets
    Leena Salmela
    Jorma Tarhio
    Petri Kalsi
    Algorithmica, 2010, 58 : 591 - 609
  • [48] Bit-parallel approximate string matching algorithms with transposition
    Hyyro, Heikki
    JOURNAL OF DISCRETE ALGORITHMS, 2005, 3 (2-4) : 215 - 229
  • [49] New algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance (vol 74, pg 1815, 2018)
    Ho, ThienLuan
    Oh, Seung-Rohk
    Kim, HyunJin
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (05): : 1835 - 1835
  • [50] A subquadratic algorithm for approximate limited expression matching
    Wu, S
    Manber, U
    Myers, G
    ALGORITHMICA, 1996, 15 (01) : 50 - 67