An Approach for Schema Extraction of JSON']JSON and Extended JSON']JSON Document Collections

被引:14
|
作者
Frozza, Angelo Augusto [1 ]
Mello, Ronaldo dos Santos [2 ]
da Costa, Felipe de Souza [2 ]
机构
[1] Catatarinense Fed Inst, Camboriu, SC, Brazil
[2] Univ Fed Santa Catarina, Florianopolis, SC, Brazil
来源
2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI) | 2018年
关键词
NoSQL; !text type='JSON']JSON[!/text; Extended [!text type='JSON']JSON[!/text; Schema Extraction; !text type='JSON']JSON[!/text] Schema; Document-Oriented Database;
D O I
10.1109/IRI.2018.00060
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
JSON documents are raising as a common format for data representation due to the increasing popularity of NoSQL document-oriented databases. One of the reasons for such a popularity is their ability to handle large volumes of data at the absence of an explicit data schema. However, schema information is sometimes essential for applications during data retrieval, integration and analysis tasks, for example. Given this context, this paper presents an approach that extracts a schema from a JSON or Extended JSON document collection stored in a NoSQL document-oriented database or other document repository. Aggregation operations are considered in order to obtain a schema for each distinct structure in the collection, and a hierarchical data structure is proposed to group these schemas in order to generate a global schema in JSON Schema format. Experiments conducted on actual datasets, like DBPedia and Foursquare, demonstrate that the accuracy of the generated schemas is equivalent or even superior than related work.
引用
收藏
页码:356 / 363
页数:8
相关论文
共 50 条
  • [1] JSON']JSON document clustering based on schema embeddings
    Priya, D. Uma
    Thilagam, P. Santhi
    JOURNAL OF INFORMATION SCIENCE, 2024, 50 (05) : 1112 - 1130
  • [2] Foundations of JSON']JSON Schema
    Pezoa, Felipe
    Reutter, Juan L.
    Suarez, Fernando
    Ugarte, Martin
    Vrgoc, Domagoj
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 263 - 273
  • [3] JSON']JSON Schema Inference Approaches
    Contos, Pavel
    Svoboda, Martin
    ADVANCES IN CONCEPTUAL MODELING, ER 2020, 2020, 12584 : 173 - 183
  • [4] Witness Generation for JSON']JSON Schema
    Attouche, Lyes
    Baazizi, Mohamed-Amine
    Colazzo, Dario
    Ghelli, Giorgio
    Sartiani, Carlo
    Scherzinger, Stefanie
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (13): : 4002 - 4014
  • [5] A web service based on RESTful API and JSON']JSON Schema/JSON']JSON Meta Schema to construct knowledge graphs
    Agocs, Adam
    Le Goff, Jean-Marie
    2018 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (IEEE CITS 2018), 2018, : 167 - 171
  • [6] JSON']JSONDISCOVERER: Visualizing the schema lurking behind JSON']JSON documents
    Canovas Izquierdo, Javier Luis
    Cabot, Jordi
    KNOWLEDGE-BASED SYSTEMS, 2016, 103 : 52 - 55
  • [7] Nested Schema Mappings for Integrating JSON']JSON
    Hai, Rihan
    Quix, Christoph
    Kensche, David
    CONCEPTUAL MODELING, ER 2018, 2018, 11157 : 397 - 405
  • [8] Negation-closure for JSON']JSON Schema
    Baazizi, Mohamed -Amine
    Colazzo, Dario
    Ghelli, Giorgio
    Sartiani, Carlo
    Scherzinger, Stefanie
    THEORETICAL COMPUTER SCIENCE, 2023, 955
  • [9] JSON']JSON Schema Matching: Empirical Observations
    Waghray, Kunal
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2887 - 2889
  • [10] Reducing Ambiguity in Json']Json Schema Discovery
    Spoth, William
    Kennedy, Oliver
    Lu, Ying
    Hammerschmidt, Beda
    Liu, Zhen Hua
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 1732 - 1744