SJSON']JSON: A succinct representation for JSON']JSON documents

被引:3
作者
Lee, Junhee [1 ]
Anjos, Edman [2 ]
Satti, Srinivasa Rao [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
[2] Google, Hamburg, Germany
关键词
!text type='JSON']JSON[!/text; Succinct data structure; Semi-structured document representation; Heterogeneous array indexing; SUFFIX ARRAYS; TREES;
D O I
10.1016/j.is.2020.101686
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The massive amounts of data processed in modern computational systems are becoming a problem of increasing importance. This data is commonly stored directly or indirectly through the use of data exchange languages, such as JSON (JavaScript Object Notation) and XML (eXtensible Markup Language), for human-readable platform-agnostic access. This paper focuses on exploring a set of succinct representations for JSON documents, which we call SJSON, achieving both reduced RAM and disk usage while supporting efficient queries on the documents. The representations we propose are mainly based on the idea that JSON documents can be decomposed into structural part and raw data part. In our method, we emulate the structure of the JSON document as a rooted ordered tree and represent it using succinct data structures, as opposed to the usual pointer-based implementation. Furthermore, the remaining raw data is reorganized into arrays of attributes and values. This deconstruction between structure and data allows for a straightforward connection between a node in the succinct tree and its corresponding name-value pair, dispensing pointers altogether. The proposed scheme is implemented as the SJSON library in C++, and evaluated with respect to a number of metrics, comparing its performance with popular alternative JSON parsers. Empirical results show that the library is able to represent JSON files succinctly while efficiently supporting traversal queries. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 54 条
  • [1] Anderson J. C., 2010, CouchDB: The Definitive Guide: Time to Relax
  • [2] Anjos E, 2016, 2016 ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM 2016), P173, DOI 10.1109/ICDIM.2016.7829787
  • [3] [Anonymous], 2011, P VLDB C
  • [4] [Anonymous], 2017, JAVASCRIPT OBJECT NO
  • [5] [Anonymous], 2014, BIGDND BIG DYNAMIC N
  • [6] Arion Andrei, 2003, P 29 INT C VER LARG, V29, P1065
  • [7] Arroyuelo D., 2010, P WORKSH ALG ENG EXP, P84
  • [8] Representing trees of higher degree
    Benoit, D
    Demaine, ED
    Munro, JI
    Raman, R
    Raman, V
    Rao, SS
    [J]. ALGORITHMICA, 2005, 43 (04) : 275 - 292
  • [9] Bowman S. R., ARXIV150805326
  • [10] Brandtzg P.B., 2013, BIG DATA BETTER WORS