Regular expression types for XML

被引:78
作者
Hosoya, H
Vouillon, J
Pierce, BC
机构
[1] Univ Tokyo, Fac Sci, Bunkyo Ku, Tokyo 113, Japan
[2] Univ Paris 07, F-75251 Paris 05, France
[3] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
来源
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS | 2005年 / 27卷 / 01期
关键词
type systems; XML; subtyping;
D O I
10.1145/1053468.1053470
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (\), etc., to describe XML documents. The novelty of our type system is a semantic presentation of subtyping, as inclusion between the sets of documents denoted by two types. We give several examples illustrating the usefulness of this form of subtyping in XML processing. The decision problem for the subtype relation reduces to the inclusion problem between tree automata, which is known to be EXPTIME-complete. To avoid this high complexity in typical cases, we develop a practical algorithm that, unlike classical algorithms based on determinization of tree automata, checks the inclusion relation by a top-down traversal of the original type expressions. The main advantage of this algorithm is that it can exploit the property that type expressions being compared often share portions of their representations. Our algorithm is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several new implementation techniques, correctness proofs, and preliminary performance measurements on some small programs in the domain of typed XML processing.
引用
收藏
页码:46 / 90
页数:45
相关论文
共 38 条
[1]  
AIKEN A, 1991, LECT NOTES COMPUTER, V523
[2]  
AMADIO RM, 1993, ACM T PROG LAND SYST, P104
[3]  
Brandt M., 1998, Fundamenta Informaticae, V33, P309
[4]   DERIVATIVES OF REGULAR EXPRESSIONS [J].
BRZOZOWSKI, JA .
JOURNAL OF THE ACM, 1964, 11 (04) :481-&
[5]  
Buneman P, 1997, LECT NOTES COMPUT SC, V1186, P336
[6]  
BUNEMAN P, 1998, LECT NOTES COMPUTER, V1686
[7]   USING MULTISET DISCRIMINATION TO SOLVE LANGUAGE PROCESSING PROBLEMS WITHOUT HASHING [J].
CAI, JZ ;
PAIGE, R .
THEORETICAL COMPUTER SCIENCE, 1995, 145 (1-2) :189-228
[8]  
Chawathe SS, 1999, PROCEEDINGS OF THE TWENTY-FIFTH INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES, P90
[9]  
CLARK J, 2001, TREX TREE REGULAR EX
[10]  
CLARK J, 1999, XSL TRANSOFRMATIONS