A randomized algorithm for approximate string matching

被引:28
|
作者
Atallah, MJ [1 ]
Chyzak, F
Dumas, P
机构
[1] Purdue Univ, CERIAS, W Lafayette, IN 47907 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[3] Inst Natl Rech Informat & Automat, F-78153 Le Chesnay, France
关键词
convolution; FFT; approximate string matching; randomized algorithms;
D O I
10.1007/s004530010062
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We give a randomized algorithm in deterministic time O(N log M) for estimating the score vector of matches between a text string of length N and a pattern string of length M, i.e., the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. A direct application is approximate string matching. The randomized algorithm uses convolution to find an estimator of the scores; the variance of the estimator is particularly small for scores that are close to M, i.e., for approximate occurrences of the pattern in the text. No assumption is made about the probabilistic characteristics of the input, or about the size of the alphabet. The solution extends to string matching with classes, class complements, "never match" and "always match" symbols, to the weighted case and to higher dimensions.
引用
收藏
页码:468 / 486
页数:19
相关论文
共 50 条
  • [1] A lower-variance randomized algorithm for approximate string matching
    Atallah, Mikhail J.
    Grigorescu, Elena
    Wu, Yi
    INFORMATION PROCESSING LETTERS, 2013, 113 (18) : 690 - 692
  • [2] A Consensus Algorithm for Approximate String Matching
    Rubio, Miguel
    Alba, Alfonso
    Mendez, Martin
    Arce-Santana, Edgar
    Rodriguez-Kessler, Margarita
    3RD IBEROAMERICAN CONFERENCE ON ELECTRONICS ENGINEERING AND COMPUTER SCIENCE, CIIECC 2013, 2013, 7 : 322 - 327
  • [3] Approximate String Matching Algorithm for Phishing Detection
    Abraham, Dona
    Raj, Nisha S.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 2285 - 2290
  • [4] A Fast Approximate String Matching Algorithm on GPU
    Nunes, Lucas S. N.
    Bordim, J. L.
    Nakano, K.
    Ito, Y.
    PROCEEDINGS OF 2015 THIRD INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2015, : 188 - 192
  • [5] A Preprocessing for Approximate String Matching
    Baba, Kensuke
    Nakatoh, Tetsuya
    Yamada, Yasuhiro
    Ikeda, Daisuke
    INFORMATICS ENGINEERING AND INFORMATION SCIENCE, PT II, 2011, 252 : 610 - +
  • [6] An approximate string matching algorithm for on-line Chinese character recognition
    Pao, DCW
    Sun, MC
    Lam, MCH
    IMAGE AND VISION COMPUTING, 1997, 15 (09) : 695 - 703
  • [7] On approximate string matching of unique oligonucleotides
    Hyyrö, H
    Vihinen, M
    Juhola, M
    MEDINFO 2001: PROCEEDINGS OF THE 10TH WORLD CONGRESS ON MEDICAL INFORMATICS, PTS 1 AND 2, 2001, 84 : 960 - 964
  • [8] Compressed Indexes for Approximate String Matching
    Chan, Ho-Leung
    Lam, Tak-Wah
    Sung, Wing-Kin
    Tam, Siu-Lung
    Wong, Swee-Seong
    ALGORITHMICA, 2010, 58 (02) : 263 - 281
  • [9] Compressed Indexes for Approximate String Matching
    Ho-Leung Chan
    Tak-Wah Lam
    Wing-Kin Sung
    Siu-Lung Tam
    Swee-Seong Wong
    Algorithmica, 2010, 58 : 263 - 281
  • [10] Fast and practical approximate string matching
    BaezaYates, RA
    Perleberg, CH
    INFORMATION PROCESSING LETTERS, 1996, 59 (01) : 21 - 27