Efficient incremental loading in ETL processing for real-time data integration

被引:0
|
作者
Neepa Biswas
Anamitra Sarkar
Kartick Chandra Mondal
机构
[1] Jadavpur University,Department of Information Technology
关键词
Data warehouse; Code-based ETL; ETL tools; Pygrametl; Petl; Scriptella; Incremental load; Bulk load; CDC;
D O I
暂无
中图分类号
学科分类号
摘要
ETL (extract transform load) is the widely used standard process for creating and maintaining a data warehouse (DW). ETL is the most resource-, cost- and time-demanding process in DW implementation and maintenance. Nowadays, many graphical user interfaces (GUI)-based solutions are available to facilitate the ETL processes. In spite of the high popularity of GUI-based tool, there is still some downside of such approach. This paper focuses on alternative ETL developmental approach taken by hand coding. In some contexts like research and academic work, it is appropriate to go for custom-coded solution which can be cheaper, faster and maintainable compared to any GUI-based tools. Some well-known code-based open-source ETL tools developed by the academic world have been studied in this article. Their architecture and implementation details are addressed here. The aim of this paper is to present a comparative evaluation of these code-based ETL tools. Finally, an efficient ETL model is designed to meet the near real-time responsibility of the present days.
引用
收藏
页码:53 / 61
页数:8
相关论文
共 50 条
  • [1] Efficient incremental loading in ETL processing for real-time data integration
    Biswas, Neepa
    Sarkar, Anamitra
    Mondal, Kartick Chandra
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2020, 16 (01) : 53 - 61
  • [2] Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses
    Qu, Weiping
    Basavaraj, Vinanthi
    Shankar, Sahana
    Dessloch, Stefan
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 217 - 228
  • [3] Real-Time Data ETL Framework for Big Real-Time Data Analysis
    Li, Xiaofang
    Mao, Yingchi
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1289 - 1294
  • [4] HBelt: Integrating an Incremental ETL Pipeline with a Big Data Store for Real-Time Analytics
    Qu, Weiping
    Shankar, Sahana
    Ganza, Sandy
    Dessloch, Stefan
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2015, 2015, 9282 : 123 - 137
  • [5] An ETL Strategy for Real-Time Data Warehouse
    Zhou, Haihe
    Yang, Dingyu
    Xu, Yang
    PRACTICAL APPLICATIONS OF INTELLIGENT SYSTEMS, 2011, 124 : 329 - +
  • [6] GAUSSIAN INTEGRATION FOR REAL-TIME DATA-PROCESSING
    WYLER, J
    BENEDICT, RP
    INSTRUMENTS & CONTROL SYSTEMS, 1973, 46 (05): : 67 - 68
  • [7] A programmable real-time data processing and display system for the NOAA/ETL Doppler radars
    Campbell, WC
    Gibson, JS
    28TH CONFERENCE ON RADAR METEOROLOGY, 1997, : 178 - 179
  • [8] AScale: Big/Small Data ETL and Real-Time Data Freshness
    Martins, Pedro
    Abbasi, Maryam
    Furtado, Pedro
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 315 - 327
  • [9] Distributed real-time ETL architecture for unstructured big data
    Erum Mehmood
    Tayyaba Anees
    Knowledge and Information Systems, 2022, 64 : 3419 - 3445
  • [10] Distributed real-time ETL architecture for unstructured big data
    Mehmood, Erum
    Anees, Tayyaba
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (12) : 3419 - 3445