Learning Restricted Regular Expressions with Interleaving from XML Data

被引:6
|
作者
Li, Yeting [1 ,2 ]
Zhang, Xiaolan [1 ,2 ]
Xu, Han [2 ,3 ]
Mou, Xiaoying [1 ,2 ]
Chen, Haiming [1 ]
机构
[1] Chinese Acad Sci, Inst Software, State Key Lab Comp Sci, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, China Natl Sci Lib, Beijing, Peoples R China
来源
CONCEPTUAL MODELING, ER 2018 | 2018年 / 11157卷
基金
中国国家自然科学基金;
关键词
Schema learning; Regular expression; Interleaving;
D O I
10.1007/978-3-030-00847-5_43
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The presence of a schema for XML documents has numerous advantages. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Therefore, it is essential to devise algorithms to learn a schema from XML documents. The fundamental task in XML schema learning is inferring restricted subclasses of regular expressions. Previous work in this direction lacks support of interleaving. In this paper, based on the analysis of real data, we first propose a new subclass of regular expressions with interleaving, named as Extended Subclass of Regular Expressions with Interleaving (ESIRE). Then, based on single occurrence automaton and maximum independent set, we propose an algorithm GenESIRE to infer ESIREs from a set of given samples. Finally, we conduct a series of experiments to analyze the inference effectness of GenESIRE. Experimental results show that regular expressions inferred by GenESIRE are more precise compared with other methods, measured by different indicators.
引用
收藏
页码:586 / 593
页数:8
相关论文
共 50 条
  • [1] Inferring Regular Expressions with Interleaving from XML Data
    Zhang, Xiaolan
    Li, Yeting
    Tian, Fei
    Cui, Fanlin
    Dong, Chunmei
    Chen, Haiming
    WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 44 - 52
  • [2] Discovering Restricted Regular Expressions with Interleaving
    Peng, Feifei
    Chen, Haiming
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 104 - 115
  • [3] Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
    Li, Yeting
    Chen, Haiming
    Zhang, Lingqi
    Huang, Bo
    Zhang, Jianzhao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 769 - 781
  • [4] Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data
    Bex, Geert Jan
    Gelade, Wouter
    Neven, Frank
    Vansummeren, Stijn
    ACM TRANSACTIONS ON THE WEB, 2010, 4 (04)
  • [5] Learning k-Occurrence Regular Expressions with Interleaving
    Li, Yeting
    Zhang, Xiaolan
    Cao, Jialun
    Chen, Haiming
    Gao, Chong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II, 2019, 11447 : 70 - 85
  • [6] Deterministic Regular Expressions with Interleaving
    Peng, Feifei
    Chen, Haiming
    Mou, Xiaoying
    THEORETICAL ASPECTS OF COMPUTING - ICTAC 2015, 2015, 9399 : 203 - 220
  • [7] An Effective Algorithm for Learning Single Occurrence Regular Expressions with Interleaving
    Li, Yeting
    Chen, Haiming
    Zhang, Xiaolan
    Zhang, Lingqi
    IDEAS '19: PROCEEDINGS OF THE 23RD INTERNATIONAL DATABASE APPLICATIONS & ENGINEERING SYMPOSIUM (IDEAS 2019), 2019, : 189 - 198
  • [8] Learning Restricted Deterministic Regular Expressions with Counting
    Wang, Xiaofan
    Chen, Haiming
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2019, 2019, 11881 : 98 - 114
  • [9] Fast Learning of Restricted Regular Expressions and DTDs
    Freydenberger, Dominik D.
    Koetzing, Timo
    THEORY OF COMPUTING SYSTEMS, 2015, 57 (04) : 1114 - 1158
  • [10] Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking
    Colazzo, D.
    Ghelli, G.
    Pardini, L.
    Sartiani, C.
    THEORETICAL COMPUTER SCIENCE, 2013, 492 : 88 - 116