Developing a data pipeline solution for big data processing

被引:2
作者
Lipovac, Ivona [1 ]
Babac, Marina Bagic [1 ]
机构
[1] Univ Zagreb, Fac Elect Engn & Comp, Unska 3, HR-10000 Zagreb, Croatia
关键词
big data; data pipeline; data processing; data analysis; cloud computing;
D O I
10.1504/IJDMMM.2024.136221
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a comprehensive exploration of the concept of big data and its management while highlighting the challenges that arise in the process. The study showcases the development of a data pipeline, designed to facilitate big data collection, integration, and analysis while addressing state-of-the-art challenges, methods, tools, and technologies. Emphasis is placed on pipeline flexibility, with a view towards enabling ease of implementation of architecture changes, seamless integration of new sources, and straightforward implementation of additional transformations in existing pipelines as needed. The pipeline architecture is discussed in detail, with a focus on its design principles, components, and implementation details, as well as the mechanisms used to ensure its reliability, scalability, and performance. Results from a range of experiments demonstrate the pipeline's effectiveness in addressing the challenges of big data management and analysis, as well as its robustness and versatility in accommodating diverse data sources and processing requirements. This study provides insights into the critical role of data pipelines in enabling effective big data management and showcases the importance of flexibility in pipeline design to ensure adaptability to evolving data processing needs.
引用
收藏
页码:1 / 22
页数:23
相关论文
共 50 条
  • [31] Streaming Big Data Processing in Datacenter Clouds
    Ranjan, Rajiv
    IEEE CLOUD COMPUTING, 2014, 1 (01) : 78 - 83
  • [32] The method of Big data processing
    Shakhovska, Natalya
    PROCEEDINGS OF THE 2017 12TH INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE ON COMPUTER SCIENCES AND INFORMATION TECHNOLOGIES (CSIT 2017), VOL. 1, 2017, : 122 - 126
  • [33] THE CORRELATION ANALYSIS OF THE BIG DATA FOR PIPELINE DEFECT
    Zhang Hewei
    Dong Shaohua
    Zhang Laibin
    PROCEEDINGS OF THE ASME PRESSURE VESSELS AND PIPING CONFERENCE, 2017, VOL 2, 2017,
  • [34] Development of Big Data-Analysis Pipeline for Mobile Phone Data with Mobipack and Spatial Enhancement
    Witayangkurn, Apichon
    Arai, Ayumi
    Shibasaki, Ryosuke
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2022, 11 (03)
  • [35] A Distributed Pipeline for DIDSON Data Processing
    Li, Liling
    Danner, Tyler
    Eickholt, Jesse
    McCann, Erin
    Pangle, Kevin
    Johnson, Nicholas
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4301 - 4306
  • [36] Developing Concept Enriched Models for Processing Big Data Within the Medical Domain
    Gudivada, Akhil
    Tabrizi, Nasseh
    PROCEEDINGS OF THE 2019 IEEE 18TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2019), 2019, : 222 - 229
  • [37] Big data and data processing in rheumatology: bioethical perspectives
    Amaranta Manrique de Lara
    Ingris Peláez-Ballestas
    Clinical Rheumatology, 2020, 39 : 1007 - 1014
  • [38] Data Processing for Direct Marketing Through Big Data
    Viloria, Amelec
    Varela, Noel
    Maldonado Perez, Doyreg
    Lezama, Omar Bonerge Pineda
    COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 187 - 192
  • [39] Big data and data processing in rheumatology: bioethical perspectives
    Manrique de Lara, Amaranta
    Pelaez-Ballestas, Ingris
    CLINICAL RHEUMATOLOGY, 2020, 39 (04) : 1007 - 1014