Scalable Data Placement of Data-intensive Services in Geo-distributed Clouds

被引:4
|
作者
Atrey, Ankita [1 ]
Van Seghbroeck, Gregory [1 ]
Volckaert, Bruno [1 ]
De Turck, Filip [1 ]
机构
[1] UGent, IDLAB Imec, Technol Pk, Ghent, Belgium
关键词
Data Placement; Geo-distributed Clouds; Location-based Services; Online Social Networks; Scalability; Spectral Clustering; Hypergraphs; Approximation;
D O I
10.5220/0006767504970508
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The advent of big data analytics and cloud computing technologies has resulted in wide-spread research in finding solutions to the data placement problem, which aims at properly placing the data items into distributed datacenters. Although traditional schemes of uniformly partitioning the data into distributed nodes is the defacto standard for many popular distributed data stores like HDFS or Cassandra, these methods may cause network congestion for data-intensive services, thereby affecting the system throughput. This is because as opposed to MapReduce style workloads, data-intensive services require access to multiple datasets within each transaction. In this paper, we propose a scalable method for performing data placement of data-intensive services into geographically distributed clouds. The proposed algorithm partitions a set of data-items into geo-distributed clouds using spectral clustering on hypergraphs. Additionally, our spectral clustering algorithm leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix, thereby facilitating superior scalability for computation of the spectra of the hypergraph laplacian. Experiments on a real-world trace-based online social network dataset show that the proposed algorithm is effective, efficient, and scalable. Empirically, it is comparable or even better (in certain scenarios) in efficacy on the evaluated metrics, while being up to 10 times faster in running time when compared to state-of-the-art techniques.
引用
收藏
页码:497 / 508
页数:12
相关论文
共 50 条
  • [21] An Online Placement Scheme for VNF Chains in Geo-Distributed Clouds
    Zhou, Ruiting
    2018 IEEE/ACM 26TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS), 2018,
  • [22] Time Optimization Modeling for Big Data Placement and Analysis for Geo-Distributed Data Centers
    Khan, Awais
    Attique, Muhammad
    Chung, Tae-Sun
    Kim, Youngjae
    2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 140 - 141
  • [23] QoS-Aware Data Placement for MapReduce Applications in Geo-Distributed Data Centers
    Chen, Wuhui
    Liu, Baichuan
    Paik, Incheon
    Li, Zhenni
    Zheng, Zibin
    IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2021, 68 (01) : 120 - 136
  • [24] Scalable Data-Intensive Analytics
    Hsu, Meichun
    Chen, Qiming
    BUSINESS INTELLIGENCE FOR THE REAL-TIME ENTERPRISE, 2009, 27 : 97 - +
  • [25] Placement of High Availability Geo-Distributed Data Centers in Emerging Economies
    Liu, Ruiyun
    Sun, Weiqiang
    Hu, Weisheng
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 3274 - 3288
  • [26] Open active services for data-intensive distributed applications
    Collet, C
    Vargas-Solar, G
    Grazziotin-Ribeiro, H
    2000 INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM - PROCEEDINGS, 2000, : 349 - 359
  • [27] The Quest for Scalable Support of Data-Intensive Workloads in Distributed Systems
    Raicu, Ioan
    Foster, Ian T.
    Zhao, Yong
    Little, Philip
    Moretti, Christopher M.
    Chaudhary, Amitabh
    Thain, Douglas
    HPDC'09: 18TH ACM INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 2009, : 207 - 216
  • [28] Open active services for data-intensive distributed applications
    Collet, Christine
    Vargas-Solar, Genoveva
    Grazziotin-Ribeiro, Helena
    Proceedings of the International Database Engineering and Applications Symposium, IDEAS, 2000, : 349 - 359
  • [29] A novel cloud model based data placement strategy for data-intensive application in clouds
    Zhang, Xinxin
    Hu, Zhigang
    Zheng, Meiguang
    Li, Jia
    Yang, Liu
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 77 : 445 - 456
  • [30] A Network Performance Based Data Placement Policy in Distributed Data-Intensive Applications
    Xu, Dawei
    Miao, Xianglin
    Hu, Peng
    Luan, Zhongzhi
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2014, : 795 - 800