An SMT Solver for Regular Expressions and Linear Arithmetic over String Length

被引:24
作者
Berzish, Murphy [1 ]
Kulczynski, Mitja [2 ]
Mora, Federico [3 ]
Manea, Florin [4 ,5 ]
Day, Joel D. [6 ]
Nowotka, Dirk [2 ]
Ganesh, Vijay [1 ]
机构
[1] Univ Waterloo, Waterloo, ON, Canada
[2] Univ Kiel, Kiel, Germany
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
[4] Univ Gottingen, Gottingen, Germany
[5] Campus Inst Data Sci, Gottingen, Germany
[6] Loughborough Univ, Loughborough, Leics, England
来源
COMPUTER AIDED VERIFICATION, PT II, CAV 2021 | 2021年 / 12760卷
关键词
String solvers; SMT solvers; Regular expressions; WORD EQUATIONS; MODEL;
D O I
10.1007/978-3-030-81688-9_14
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel length-aware solving algorithm for the quantifier-free first-order theory over regex membership predicate and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. A crucial insight that underpins our algorithm is that real-world regex and string formulas contain a wealth of information about upper and lower bounds on lengths of strings, and such information can be used very effectively to simplify operations on automata representing regular expressions. Additionally, we present a number of novel general heuristics, such as the prefix/suffix method, that can be used to make a variety of regex solving algorithms more efficient in practice. We showcase the power of our algorithm and heuristics via an extensive empirical evaluation over a large and diverse benchmark of 57256 regex-heavy instances, almost 75% of which are derived from industrial applications or contributed by other solver developers. Our solver outperforms five other state-of-the-art string solvers, namely, CVC4, OSTRICH, Z3seq, Z3str3, and Z3-Trau, over this benchmark, in particular achieving a speedup of 2.4x over CVC4, 4.4x over Z3seq, 6.4xover Z3-Trau, 9.1xover Z3str3, and 13xover OSTRICH.
引用
收藏
页码:289 / 312
页数:24
相关论文
共 42 条
[1]   Efficient Handling of String-Number Conversion [J].
Abdulla, Parosh Aziz ;
Atig, Mohamed Faouzi ;
Chen, Yu-Fang ;
Diep, Bui Phi ;
Dolby, Julian ;
Jank, Petr ;
Lin, Hsin-Hung ;
Holik, Lukas ;
Wu, Wei-Cheng .
PROCEEDINGS OF THE 41ST ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '20), 2020, :943-957
[2]  
Abdulla PA, 2017, ACM SIGPLAN NOTICES, V52, P602, DOI [10.1145/3140587.3062384, 10.1145/3062341.3062384]
[3]  
Abdulla PA, 2014, LECT NOTES COMPUT SC, V8559, P150, DOI 10.1007/978-3-319-08867-9_10
[4]   Automata-Based Model Counting for String Constraints [J].
Aydin, Abdulbaki ;
Bang, Lucas ;
Bultan, Tevfik .
COMPUTER AIDED VERIFICATION, PT I, 2015, 9206 :255-272
[5]   One-Click Formal Methods [J].
Backes, John ;
Bolignano, Pauline ;
Cook, Byron ;
Gacek, Andrew ;
Luckow, Kasper Soe ;
Rungta, Neha ;
Schaef, Martin ;
Schlesinger, Cole ;
Tanash, Rima ;
Varming, Carsten ;
Whalen, Michael .
IEEE SOFTWARE, 2019, 36 (06) :61-65
[6]  
Barbosa H., 2020, 15 INT SAT MOD THEOR
[7]  
Berzish M., STRING THEORIES INVO, P2021
[8]  
Berzish M., 2020, LENGTH AWARE REGULAR
[9]  
Berzish M, 2017, PROCEEDINGS OF THE 17TH CONFERENCE ON FORMAL METHODS IN COMPUTER AIDED DESIGN (FMCAD 2017), P55, DOI 10.23919/FMCAD.2017.8102241
[10]  
Bjorner N., 2012, SMT WORKSH 2012