On the bit-parallel simulation of the nondeterministic Aho-Corasick and suffix automata for a set of patterns

被引:7
作者
Cantone, Domenico [1 ]
Faro, Simone [1 ]
Giaquinta, Emanuele [1 ]
机构
[1] Univ Catania, Dipartimento Matemat & Informat, Viale Andrea Doria 6, I-95125 Catania, Italy
关键词
Multiple pattern matching; Text processing; Automata; Bit-parallelism;
D O I
10.1016/j.jda.2011.02.001
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet Sigma of size sigma. Our algorithms, named Log-And and Backward-Log-And, require O((m+sigma) inverted right perpendicular m/w inverted left perpendicular )-space, and work in O(n inverted right perpendicular m/w inverted left perpendicular) and O(n inverted right perpendicular m/w inverted left perpendicular l(min)) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and l(min) is the length of the shortest pattern in P. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:25 / 36
页数:12
相关论文
共 14 条
[1]   EFFICIENT STRING MATCHING - AID TO BIBLIOGRAPHIC SEARCH [J].
AHO, AV ;
CORASICK, MJ .
COMMUNICATIONS OF THE ACM, 1975, 18 (06) :333-340
[2]  
Arndt J., 2011, MATTERS COMPUTATIONA
[3]   A NEW APPROACH TO TEXT SEARCHING [J].
BAEZAYATES, R ;
GONNET, GH .
COMMUNICATIONS OF THE ACM, 1992, 35 (10) :74-82
[4]   AVERAGE SIZES OF SUFFIX TREES AND DAWGS [J].
BLUMER, A ;
EHRENFEUCHT, A ;
HAUSSLER, D .
DISCRETE APPLIED MATHEMATICS, 1989, 24 (1-3) :37-45
[5]   COMPLETE INVERTED FILES FOR EFFICIENT TEXT RETRIEVAL AND ANALYSIS [J].
BLUMER, A ;
BLUMER, J ;
HAUSSLER, D ;
MCCONNELL, R ;
EHRENFEUCHT, A .
JOURNAL OF THE ACM, 1987, 34 (03) :578-595
[6]   A space efficient bit-parallel algorithm for the multiple string matching problem [J].
Cantone, Domenico ;
Faro, Simone .
INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2006, 17 (06) :1235-1251
[7]  
CROCHEMORE Maxime, 1994, TEXT ALGORITHMS
[8]  
Hoperoft J.E., 2001, INTRO AUTOMATA THEOR
[9]  
Knuth D. E., 1977, SIAM Journal on Computing, V6, P323, DOI 10.1137/0206024
[10]   New techniques for regular expression searching [J].
Navarro, G ;
Raffinot, M .
ALGORITHMICA, 2005, 41 (02) :89-116