An IDEA: An Ingestion Framework for Data Enrichment in AsterixDB

被引:7
|
作者
Wang, Xikui [1 ]
Carey, Michael J. [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92717 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 11期
基金
美国国家科学基金会;
关键词
D O I
10.14778/3342263.3342628
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big Data today is being generated at an unprecedented rate from various sources such as sensors, applications, and devices, and it often needs to be enriched based on other reference information to support complex analytical queries. Depending on the use case, the enrichment operations can be compiled code, declarative queries, or machine learning models with different complexities. For enrichments that will be frequently used in the future, it can be advantageous to push their computation into the ingestion pipeline so that they can be stored (and queried) together with the data. In some cases, the referenced information may change over time, so the ingestion pipeline should be able to adapt to such changes to guarantee the currency and/or correctness of the enrichment results. In this paper, we present a new data ingestion framework that supports data ingestion at scale, enrichments requiring complex operations, and adaptiveness to reference data changes. We explain how this framework has been built on top of Apache AsterixDB and investigate its performance at scale under various workloads.
引用
收藏
页码:1485 / 1498
页数:14
相关论文
共 50 条
  • [1] DynaHash: Efficient Data Rebalancing in Apache AsterixDB
    Luo, Chen
    Carey, Michael J.
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 485 - 497
  • [2] A Framework for Semantic Enrichment of Sensor Data
    Moraru, Alexandra
    Mladenic, Dunja
    PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 155 - 160
  • [3] A framework for semantic enrichment of sensor data
    Moraru, Alexandra
    Mladení, Dunja
    Moraru, A. (alexandra.moraru@ijs.si), 1600, University of Zagreb Faculty of Electrical Engineering and Computing (20): : 167 - 173
  • [4] A semantic framework for textual data enrichment
    Gutierrez, Yoan
    Vazquez, Sonia
    Montoyo, Andres
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 57 : 248 - 269
  • [5] A Scalable and Robust Framework for Data Stream Ingestion
    Isah, Haruna
    Zulkernine, Farhana
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 2900 - 2905
  • [6] An LSM-based Tuple Compaction Framework for Apache AsterixDB
    Alkowaileet, Wail Y.
    Alsubaiee, Sattam
    Carey, Michael J.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (09): : 1388 - 1400
  • [7] Enhancing Big Data with Semantics: The AsterixDB Approach (Poster)
    Alkowaileet, Wail
    Alsubaiee, Sattam
    Carey, Michael J.
    Li, Chen
    Ramampiaro, Heri
    Sinthong, Phanwadee
    Wang, Xikui
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 314 - 315
  • [8] Optimization of Merge Policy in AsterixDB Big Data Management System
    Zhang, Jie
    Li, Zhiyuan
    You, Yidou
    Huang, Runge
    Liu, Jin
    Chen, Xu
    2016 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS (IIKI), 2016, : 33 - 38
  • [9] Road Data Enrichment Framework Based on Heterogeneous Data Fusion for ITS
    Rettore, Paulo H. L.
    Santos, Bruno P.
    Lopes, Roberto Rigolin F.
    Maia, Guilherme
    Villas, Leandro A.
    Loureiro, Antonio A. F.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (04) : 1751 - 1766
  • [10] Motif Enrichment Analysis: a unified framework and an evaluation on ChIP data
    Robert C McLeay
    Timothy L Bailey
    BMC Bioinformatics, 11