A randomized algorithm for approximate string matching

被引:28
|
作者
Atallah, MJ [1 ]
Chyzak, F
Dumas, P
机构
[1] Purdue Univ, CERIAS, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France
关键词
convolution; FFT; approximate string matching; randomized algorithms;
D O I
10.1007/s004530010062
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We give a randomized algorithm in deterministic time O(N log M) for estimating the score vector of matches between a text string of length N and a pattern string of length M, i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm uses convolution to find an estimator of the scores; the variance of the estimator is particularly small for scores that are close to M, i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, "never match" and "always match" symbols, to the weighted case and to higher dimensions.
引用
收藏
页码:468 / 486
页数:19
相关论文
共 50 条
  • [31] Processor array architectures for flexible approximate string matching
    Michallidis, Panagiotis D.
    Margaritis, Konstantinos G.
    JOURNAL OF SYSTEMS ARCHITECTURE, 2008, 54 (1-2) : 35 - 54
  • [32] APPROXIMATE STRING-MATCHING WITH DONT CARE CHARACTERS
    AKUTSU, T
    INFORMATION PROCESSING LETTERS, 1995, 55 (05) : 235 - 239
  • [33] On-line approximate string matching in natural language
    Fredriksson, Kimmo
    FUNDAMENTA INFORMATICAE, 2006, 72 (04) : 453 - 466
  • [34] A windowed weighted approach for approximate cyclic string matching
    Mollineda, RA
    Vidal, E
    Casacuberta, F
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 188 - 191
  • [35] Approximate string matching for stroke direction and pressure sequences
    Cha, SH
    Shin, YC
    Srihari, SN
    DOCUMENT RECOGNITION AND RETRIEVAL VII, 2000, 3967 : 2 - 10
  • [36] A Parallel Algorithm for the Fixed-length Approximate String Matching Problem for High Throughput Sequencing Technologies
    Iliopoulos, Costas S.
    Mouchard, Laurent
    Pissis, Solon P.
    PARALLEL COMPUTING: FROM MULTICORES AND GPU'S TO PETASCALE, 2010, 19 : 150 - 157
  • [37] A new filtration method and a hybrid strategy for approximate string matching
    Lu, Chia Wei
    Lu, Chin Lung
    Lee, R. C. T.
    THEORETICAL COMPUTER SCIENCE, 2013, 481 : 9 - 17
  • [38] Content based video retrieval based on approximate string matching
    Masihi, ZG
    Charkari, NM
    Eurocon 2005: The International Conference on Computer as a Tool, Vol 1 and 2 , Proceedings, 2005, : 1300 - 1303
  • [39] Efficient Implementations of the Approximate String Matching on the Memory Machine Models
    Nakano, Koji
    2012 THIRD INTERNATIONAL CONFERENCE ON NETWORKING AND COMPUTING (ICNC 2012), 2012, : 233 - 239
  • [40] Space-efficient computation of parallel approximate string matching
    Muhammad Umair Sadiq
    Muhammad Murtaza Yousaf
    The Journal of Supercomputing, 2023, 79 : 9093 - 9126