Resource-Efficient Regular Expression Matching Architecture for Text Analytics

被引:0
作者
Atasu, Kubilay [1 ]
机构
[1] IBM Res Zurich, Zurich, Switzerland
来源
PROCEEDINGS OF THE 2014 IEEE 25TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2014) | 2014年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Text analytics systems, such as IBM's SystemT software, rely on regular expressions (regexs) and dictionaries for transforming unstructured data into a structured format. Unlike network intrusion detection systems, text analytics systems compute and report precisely where the specific and sensitive information starts and ends in a text document. Therefore, advanced regex matching functions, such as start-offset reporting, capturing groups, and leftmost match computation are heavily used in text analytics systems. We present a novel regex matching architecture that supports such functions in a resource-efficient way. The resource efficiency is achieved by 1) eliminating state replication, 2) avoiding expensive offset comparison operations in leftmost match computation, and 3) minimizing the number of offset registers. Experiments on regex sets from text analytics and network intrusion detection domains, using an Altera Stratix IV FPGA, show that the proposed architecture achieves a more than threefold reduction of the logic resources used and a more than 1.25-fold increase of the clock frequency with respect to a recently proposed architecture that supports identical features.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 23 条
[1]   High Speed Architectures for Finding the First two Maximum/Minimum Values [J].
Amaru, Luca G. ;
Martina, Maurizio ;
Masera, Guido .
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2012, 20 (12) :2342-2346
[2]  
[Anonymous], 2001, 9 ANN IEEE S FIELD P, DOI DOI 10.1109/FCCM.2001.22
[3]  
Atasu K., 2013, Field Programmable Logic and Applications (FPL), 2013 23rd International Conference on, P1, DOI [10.1109/FPL.2013.6645534, DOI 10.1109/FPL.2013.6645534]
[4]   Exploring the design space of programmable regular expression matching accelerators [J].
Atasu, Kubilay ;
Polig, Raphael ;
Rohrer, Jonathan ;
Hagleitner, Christoph .
JOURNAL OF SYSTEMS ARCHITECTURE, 2013, 59 (10) :1184-1196
[5]   A methodology for synthesis of efficient intrusion detection systems on FPGAs [J].
Baker, ZK ;
Prasanna, VK .
12TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2004, :135-144
[6]   A supernodal formulation of vertex colouring with applications in course timetabling [J].
Burke, Edmund K. ;
Marecek, Jakub ;
Parkes, Andrew J. ;
Rudova, Hana .
ANNALS OF OPERATIONS RESEARCH, 2010, 179 (01) :105-130
[7]   THE COMPILATION OF REGULAR EXPRESSIONS INTO INTEGRATED-CIRCUITS [J].
FLOYD, RW ;
ULLMAN, JD .
JOURNAL OF THE ACM, 1982, 29 (03) :603-622
[8]  
Hopcroft John E., 2000, INTRO AUTOMATA THEOR
[9]   Follow automata [J].
Ilie, L ;
Yu, S .
INFORMATION AND COMPUTATION, 2003, 186 (01) :140-162
[10]   MINIMAL NFA PROBLEMS ARE HARD [J].
JIANG, T ;
RAVIKUMAR, B .
SIAM JOURNAL ON COMPUTING, 1993, 22 (06) :1117-1141