CoreDB: a Data Lake Service

被引:40
作者
Beheshti, Amin [1 ]
Benatallah, Boualem [1 ]
Nouri, Reza [1 ]
Van Munin Chhieng [1 ]
Xiong, HuangTao [1 ]
Zhao, Xu [1 ]
机构
[1] Univ New South Wales, Sydney, NSW, Australia
来源
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2017年
基金
澳大利亚研究理事会;
关键词
Data Lake; Database Service; Data API;
D O I
10.1145/3132847.3133171
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The continuous improvement in connectivity, storage and data processing capabilities allow access to a data deluge from sensors, social-media, news, user-generated, government and private data sources. Accordingly, in a modern data-oriented landscape, with the advent of various data capture and management technologies, organizations are rapidly shifting to datafication of their processes. In such an environment, analysts may need to deal with a collection of datasets, from relational to NoSQL, that holds a vast amount of data gathered from various private/open data islands, i.e. Data Lake. Organizing, indexing and querying the growing volume of internal data and metadata, in a data lake, is challenging and requires various skills and experiences to deal with dozens of new databases and indexing technologies: How to store information items? What technology to use for persisting the data? How to deal with the large volume of streaming data? How to trace and persist information about data? What technology to use for indexing the data? How to query the data lake? To address the above mentioned challenges, we present CoreDB- an open source data lake service - which offers researchers and developers a single REST API to organize, index and query their data and metadata. CoreDB manages multiple database technologies and offers a built-in design for security and tracing.
引用
收藏
页码:2451 / 2454
页数:4
相关论文
共 8 条
  • [1] [Anonymous], 2015, Elasticsearch: the definitive guide: a distributed real-time search and analytics engine
  • [2] Apache, 2017, ASTERIXDB
  • [3] Beheshti S.-M.-R., 2016, Process analytics-concepts and techniques for querying and analyzing process data
  • [4] Beheshti S-M-R, 2012, ABS12115009 CORR
  • [5] On Automating Basic Data Curation Tasks
    Beheshti, Seyed-Mehdi-Reza
    Tabebordbar, Alireza
    Benatallah, Boualem
    Nouri, Reza
    [J]. WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 165 - 169
  • [6] Halevy A.Y., 2016, IEEE Data Eng. Bull, V39, P5
  • [7] DREAM: Distributed RDF Engine with Adaptive Query Planner and Minimal Communication
    Hammoud, Mohammad
    Rabbou, Dania Abed
    Nouri, Reza
    Beheshti, Seyed-Mehdi-Reza
    Sakr, Sherif
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (06): : 654 - 665
  • [8] OPM, 2017, OP PROV MOD