Developing a data pipeline solution for big data processing

被引:2
|
作者
Lipovac, Ivona [1 ]
Babac, Marina Bagic [1 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Unska 3, HR-10000 Zagreb, Croatia
关键词
big data; data pipeline; data processing; data analysis; cloud computing;
D O I
10.1504/IJDMMM.2024.136221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a comprehensive exploration of the concept of big data and its management while highlighting the challenges that arise in the process. The study showcases the development of a data pipeline, designed to facilitate big data collection, integration, and analysis while addressing state-of-the-art challenges, methods, tools, and technologies. Emphasis is placed on pipeline flexibility, with a view towards enabling ease of implementation of architecture changes, seamless integration of new sources, and straightforward implementation of additional transformations in existing pipelines as needed. The pipeline architecture is discussed in detail, with a focus on its design principles, components, and implementation details, as well as the mechanisms used to ensure its reliability, scalability, and performance. Results from a range of experiments demonstrate the pipeline's effectiveness in addressing the challenges of big data management and analysis, as well as its robustness and versatility in accommodating diverse data sources and processing requirements. This study provides insights into the critical role of data pipelines in enabling effective big data management and showcases the importance of flexibility in pipeline design to ensure adaptability to evolving data processing needs.
引用
收藏
页码:1 / 22
页数:23
相关论文
共 50 条
  • [1] Traffic Data Processing at Age of Big Data
    Zhang, Hong
    Wang, Xiaoming
    Zhu, Changsheng
    INTERNATIONAL CONFERENCE ON ELECTRICAL AND CONTROL ENGINEERING (ICECE 2015), 2015, : 976 - 980
  • [2] Big Data Processing Stacks
    Sakr S.
    Sakr, Sherif (ssakr@cse.unsw.edu.au), 2017, IEEE Computer Society (19) : 34 - 41
  • [3] IBRIDIA: A hybrid solution for processing big logistics data
    AlShaer, Mohammed
    Taher, Yehia
    Haque, Rafiqul
    Hacid, Mohand-Said
    Dbouk, Mohamed
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 97 : 792 - 804
  • [4] BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES
    Ji, Changqing
    Li, Yu
    Qiu, Wenming
    Jin, Yingwei
    Xu, Yujie
    Awada, Uchechukwu
    Li, Keqiu
    Qu, Wenyu
    JOURNAL OF INTERCONNECTION NETWORKS, 2012, 13 (3-4)
  • [5] Architectural Solution for Virtualized Processing of Big Earth Data
    Bica, Mihai
    Bacu, Victor
    Mihon, Danut
    Gorgan, Dorian
    2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 399 - 404
  • [6] A Privacy Weaving Pipeline for Open Big Data
    Yu, Yuan-Chih
    Tsai, Dwen-Ren
    PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, 2016, : 997 - 998
  • [7] Solar Data Tools: Automatic Solar Data Processing Pipeline
    Meyers, Bennet E.
    Apostolaki-Iosifidou, Elpiniki
    Schelhas, Laura T.
    2020 47TH IEEE PHOTOVOLTAIC SPECIALISTS CONFERENCE (PVSC), 2020, : 655 - 656
  • [8] Data Factory: An Efficient Data Analysis Solution in the Era of Big Data
    Wang, Yaojun
    Li, Yangyang
    Sui, Jingyan
    Gao, Yang
    2020 5TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA ANALYTICS (IEEE ICBDA 2020), 2020, : 28 - 32
  • [9] Big data processing framework for manufacturing
    Ye, Yinghao
    Wang, Meilin
    Yao, Shuhong
    Jiang, Jarvis N.
    Liu, Qing
    11TH CIRP CONFERENCE ON INDUSTRIAL PRODUCT-SERVICE SYSTEMS, 2019, 83 : 661 - 664
  • [10] Computing infrastructure for big data processing
    Liu, Ling
    FRONTIERS OF COMPUTER SCIENCE, 2013, 7 (02) : 165 - 170