Measuring the expressive power of practical regular expressions by classical stacking automata models

被引:0
作者
Nogami, Taisei [1 ]
Terauchi, Tachio [1 ]
机构
[1] Waseda Univ, 3-4-1 Okubo Shinjuku Ku, Tokyo 1698555, Japan
关键词
Regular expressions; Backreferences; Lookaheads; Expressive power;
D O I
10.1016/j.ic.2025.105303
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A rewb is a regular expression extended with a feature called backreference. It is broadly known that backreference is a practical extension of regular expressions, and is supported by most modern regular expression engines, such as those in the standard libraries of Java, Python, and more. Meanwhile, indexed languages are the languages generated by indexed grammars, a formal grammar class proposed by A.V. Aho. We show that these two models' expressive powers are related in the following way: every language described by a rewb is an indexed language. As the smallest formal grammar class previously known to contain rewbs is the class of context sensitive languages, our result strictly improves the known upper-bound. Moreover, we prove the following four claims: (1) there exists a rewb whose language does not belong to the class of stack languages, which is a proper subclass of indexed languages, (2) the language described by a rewb without a captured reference is in the class of nonerasing stack languages, which is a proper subclass of stack languages, (3) there exists a rewb that describes a stack language but not a nonerasing stack language, and (4) a rewb extended with another practical extension called lookaheads can describe a non-indexed language. Finally, we show that the hierarchy investigated in a prior study, which separates the expressive power of rewbs by the notion of nested levels, is within the class of nonerasing stack languages. (c) 2025 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:20
相关论文
共 21 条
[1]  
Aho Alfred V., 1991, Algorithms for Finding Patterns in Strings, P255
[2]   INDEXED GRAMMARS - AN EXTENSION OF CONTEXT-FREE GRAMMARS [J].
AHO, AV .
JOURNAL OF THE ACM, 1968, 15 (04) :647-&
[3]   NESTED STACK AUTOMATA [J].
AHO, AV .
JOURNAL OF THE ACM, 1969, 16 (03) :383-&
[4]   Regular Expressions with Lookahead [J].
Berglund, Martin ;
van Der Merwe, Brink ;
van Litsenborgh, Steyn .
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (04) :324-340
[5]   Re-examining regular expressions with backreferences [J].
Berglund, Martin ;
van der Merwe, Brink .
THEORETICAL COMPUTER SCIENCE, 2023, 940 :66-80
[6]  
Campeanu C., 2003, International Journal of Foundations of Computer Science, V14, P1007, DOI 10.1142/S012905410300214X
[7]   On Extended Regular Expressions [J].
Carle, Benjamin ;
Narendran, Paliath .
LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, 2009, 5457 :279-289
[8]   On Lookaheads in Regular Expressions with Backreferences [J].
Chida, Nariyoshi ;
Terauchi, Tachio .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (05) :959-975
[9]  
E.C.M.A. International, 2022, Ecmascript 2023 language specification
[10]   Deterministic regular expressions with back-references [J].
Freydenberger, Dominik D. ;
Schmid, Markus L. .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2019, 105 :1-39