Schema Inference for Multi-Model Data

被引:1
作者
Koupil, Pavel [1 ]
Hricko, Sebastian [1 ]
Holubova, Irena [1 ]
机构
[1] Charles Univ Prague, Dept Software Engn, Prague, Czech Republic
来源
PROCEEDINGS OF THE 25TH INTERNATIONAL ACM/IEEE CONFERENCE ON MODEL DRIVEN ENGINEERING LANGUAGES AND SYSTEMS, MODELS 2022 | 2022年
关键词
schema inference; multi-model data; cross-model references; data redundancy;
D O I
10.1145/3550355.3552400
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The knowledge of a structural schema of data is a crucial aspect of most data management tasks. Unfortunately, in many real-world scenarios, the data is not accompanied by it, and schema-inference approaches need to be utilised. In this paper, we focus on a specific and complex use case of multi-model data where several often contradictory features of the combined models must be considered. Hence, single-model approaches cannot be applied straightforwardly. In addition, the data often reach the scale of Big Data, and thus a scalable solution is inevitable. In our approach, we reflect all these challenges. In addition, we can also infer local integrity constraints as well as intra- and inter-model references. Last but not least, we can cope with cross-model data redundancy. Using a set of experiments, we prove the advantages of the proposed approach and we compare it with related work.
引用
收藏
页码:13 / 23
页数:11
相关论文
共 47 条
[1]  
Ahonen H., 1996, Report A- 1996-4.
[2]  
[Anonymous], 2022, The Official Site for Cassandra.
[3]  
[Anonymous], 2022, PostgreSQL
[4]  
[Anonymous], 2015, OMG Unified Modeling Language
[5]  
[Anonymous], 2021, JSON Schema-Specification.
[6]  
[Anonymous], 2022, MongoDB
[7]  
[Anonymous], 2022, The Official Site for Neo4j
[8]   Parametric schema inference for massive JSON']JSON datasets [J].
Baazizi, Mohamed-Amine ;
Colazzo, Dario ;
Ghelli, Giorgio ;
Sartiani, Carlo .
VLDB JOURNAL, 2019, 28 (04) :497-521
[9]   Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data [J].
Bex, Geert Jan ;
Gelade, Wouter ;
Neven, Frank ;
Vansummeren, Stijn .
ACM TRANSACTIONS ON THE WEB, 2010, 4 (04)
[10]   Inference of Concise Regular Expressions and DTDs [J].
Bex, Geert Jan ;
Neven, Frank ;
Schwentick, Thomas ;
Vansummeren, Stijn .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2010, 35 (02)