A Model and Survey of Distributed Data-Intensive Systems

被引:4
|
作者
Margara, Alessandro [1 ]
Cugola, Gianpaolo [1 ]
Felicioni, Nicolo [1 ]
Cilloni, Stefano [1 ]
机构
[1] Politecn Milan, Piazza Leonardo,Vinci 32, I-20133 Milan, Italy
关键词
Data-intensive systems; distributed systems; data management; data processing; model; taxonomy; TRANSACTIONS; MANAGEMENT; ENGINE; SCALE;
D O I
10.1145/3604801
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data is a precious resource in today's society, and it is generated at an unprecedented and constantly growing pace. The need to store, analyze, and make data promptly available to a multitude of users introduces formidable challenges in modern software platforms. These challenges radically impacted the research fields that gravitate around data management and processing, with the introduction of distributed data-intensive systems that offer innovative programming models and implementation strategies to handle data characteristics such as its volume, the rate at which it is produced, its heterogeneity, and its distribution. Each data-intensive system brings its specific choices in terms of data model, usage assumptions, synchronization, processing strategy, deployment, guarantees in terms of consistency, fault tolerance, and ordering. Yet, the problems data-intensive systems face and the solutions they propose are frequently overlapping. This article proposes a unifying model that dissects the core functionalities of data-intensive systems, and discusses alternative design and implementation strategies, pointing out their assumptions and implications. The model offers a common ground to understand and compare highly heterogeneous solutions, with the potential of fostering cross-fertilization across research communities. We apply our model by classifying tens of systems: an exercise that brings to interesting observations on the current trends in the domain of data-intensive systems and suggests open research directions.
引用
收藏
页数:69
相关论文
共 50 条
  • [1] On the Flexibility of Data Fulfillment Locations in Data-intensive Distributed Systems
    Yu, Boyang
    Pan, Jianping
    2016 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2016,
  • [2] The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems
    Raicu, Ioan
    Foster, Ian T.
    Zhao, Yong
    Little, Philip
    Moretti, Christopher M.
    Chaudhary, Amitabh
    Thain, Douglas
    HPDC'09: 18TH ACM INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2009, : 207 - 216
  • [3] Model and data engineering for advanced data-intensive systems and applications
    Yassine Ouhammou
    Ladjel Bellatreche
    Mirjana Ivanovic
    Alberto Abelló
    Computing, 2019, 101 : 1391 - 1395
  • [4] Optimizing Distributed Data-Intensive Workflows
    Friese, Ryan D.
    Tallent, Nathan R.
    Schram, Malachi
    Halappanavar, Mahantesh
    Barker, Kevin J.
    2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 279 - 289
  • [5] Model and data engineering for advanced data-intensive systems and applications
    Ouhammou, Yassine
    Bellatreche, Ladjel
    Ivanovic, Mirjana
    Abello, Alberto
    COMPUTING, 2019, 101 (10) : 1391 - 1395
  • [6] Sizing data-intensive systems from ER model
    Tan, HBK
    Zhao, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (04): : 1321 - 1326
  • [7] Intelligent Data-Intensive loT: A Survey
    Xiao, Bin
    Rahmani, Rahim
    Li, Yuhong
    Gillblad, Daniel
    Kanter, Theo
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 2362 - 2368
  • [8] Approximation algorithms and heuristics for task scheduling in data-intensive distributed systems
    Povoa, Marcelo G.
    Xavier, Eduardo C.
    INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2018, 25 (05) : 1417 - 1441
  • [9] Software architecture for large-scale, distributed, data-intensive systems
    Mattmann, CA
    Crichton, DJ
    Hughes, JS
    Kelly, SC
    Ramirez, PM
    FOURTH WORKING IEEE/IFIP CONFERENCE ON SOFTWARE ARCHITECTURE (WICSA 2004), PROCEEDINGS, 2004, : 255 - 264
  • [10] A Survey of Distributed and Data Intensive CBR Systems
    Mata, Aitor
    INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE 2008, 2009, 50 : 582 - 586