TOOLS FOR VERY FAST REGULAR EXPRESSION MATCHING

被引:6
作者
Pasetto, Davide
Petrini, Fabrizio [1 ]
Agarwal, Virat [1 ,2 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
DotStar; Expression matching; Multicore processors; Regular expressions;
D O I
10.1109/MC.2010.80
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
DotStar is an innovative algorithmic solution that compiles user-provided regular expressions into a compact automaton using a sequence of more manageable intermediate representations. The system can search using a single pass without backtracking, and is both space and time efficient. The entire software tool chain supports the extended Posix standard syntax for regex. At the core of DotStar is a large class of regex that DotStar tools can compile into an Aho-Corasick automaton, the de facto standard for keyword scanning, going through a series of intermediate representations that include the Glushkov automaton. The DotStar tool chain comprises five elements, a classification engine/preprocessor, an expander, a compiler, a compactor and a runtime engine. The DotStar preprocessor examines the regex set, recognizing the problematic parts, and the expander fragments them into multiple DotStar expressions. The expander allocates a minimal set of add-on data items and annotates expressions with set and test operations.
引用
收藏
页码:50 / 58
页数:9
相关论文
共 9 条
  • [1] Characterization of Glushkov automata
    Caron, P
    Ziadi, D
    [J]. THEORETICAL COMPUTER SCIENCE, 2000, 233 (1-2) : 75 - 90
  • [2] GLUSHKOV VM, 1961, RUSS MATH SURV, V16, P1, DOI [DOI 10.1070/RM1961V016N05ABEH004112, 10.1070/RM1961v016n05ABEH004112]
  • [3] Probing Higgs-sector CP violation at a photon collider
    Lee, Jae Sik
    [J]. MODERN PHYSICS LETTERS A, 2007, 22 (17) : 1191 - 1208
  • [4] Navarro Gonzalo, 2002, Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences
  • [5] Petrini F, 2009, P INT C HIGH PERF CO, P1
  • [6] Satish Kumar Satish Kumar, 2007, Postharvest management and value addition, P155, DOI 10.1145/1323548.1323574
  • [7] High-Performance Regular Expression Scanning on the Cell/BE Processor
    Scarpazza, Daniele Paolo
    Russell, Gregory F.
    [J]. ICS'09: PROCEEDINGS OF THE 2009 ACM SIGARCH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, 2009, : 14 - 25
  • [8] Sourdis I, 2003, LECT NOTES COMPUT SC, V2778, P880
  • [9] VANLUNTEREN J, P WORKSH ACC HIGH PE