Scalable Data Placement of Data-intensive Services in Geo-distributed Clouds

被引:4
|
作者
Atrey, Ankita [1 ]
Van Seghbroeck, Gregory [1 ]
Volckaert, Bruno [1 ]
De Turck, Filip [1 ]
机构
[1] UGent, IDLAB Imec, Technol Pk, Ghent, Belgium
关键词
Data Placement; Geo-distributed Clouds; Location-based Services; Online Social Networks; Scalability; Spectral Clustering; Hypergraphs; Approximation;
D O I
10.5220/0006767504970508
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The advent of big data analytics and cloud computing technologies has resulted in wide-spread research in finding solutions to the data placement problem, which aims at properly placing the data items into distributed datacenters. Although traditional schemes of uniformly partitioning the data into distributed nodes is the defacto standard for many popular distributed data stores like HDFS or Cassandra, these methods may cause network congestion for data-intensive services, thereby affecting the system throughput. This is because as opposed to MapReduce style workloads, data-intensive services require access to multiple datasets within each transaction. In this paper, we propose a scalable method for performing data placement of data-intensive services into geographically distributed clouds. The proposed algorithm partitions a set of data-items into geo-distributed clouds using spectral clustering on hypergraphs. Additionally, our spectral clustering algorithm leverages randomized techniques for obtaining low-rank approximations of the hypergraph matrix, thereby facilitating superior scalability for computation of the spectra of the hypergraph laplacian. Experiments on a real-world trace-based online social network dataset show that the proposed algorithm is effective, efficient, and scalable. Empirically, it is comparable or even better (in certain scenarios) in efficacy on the evaluated metrics, while being up to 10 times faster in running time when compared to state-of-the-art techniques.
引用
收藏
页码:497 / 508
页数:12
相关论文
共 50 条
  • [31] Service Placement and Request Scheduling for Data-intensive Applications in Edge Clouds
    Farhadi, Vajiheh
    Mehmeti, Fidan
    He, Ting
    La Porta, Tom
    Khamfroush, Hana
    Wang, Shiqiang
    Chan, Kevin S.
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 1279 - 1287
  • [32] Service Placement and Request Scheduling for Data-Intensive Applications in Edge Clouds
    Farhadi, Vajiheh
    Mehmeti, Fidan
    He, Ting
    La Porta, Thomas F.
    Khamfroush, Hana
    Wang, Shiqiang
    Chan, Kevin S.
    Poularakis, Konstantinos
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2021, 29 (02) : 779 - 792
  • [33] Achieving Cost Optimization for Tenant Task Placement in Geo-Distributed Clouds
    Luo, Luyao
    Zhao, Gongming
    Xu, Hongli
    Yu, Zhuolong
    Xie, Liguang
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (02) : 1391 - 1406
  • [34] A Framework of Hypergraph-Based Data Placement Among Geo-Distributed Datacenters
    Yu, Boyang
    Pan, Jianping
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2020, 13 (03) : 395 - 409
  • [35] Location-Aware Data Placement for Geo-distributed Online Social Networks
    Zhou, Jingya
    Fan, Jianxi
    Jia, Juncheng
    Cheng, Baolei
    Liu, Zhao
    2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 234 - 239
  • [36] Data-intensive Spatial Indexing on the Clouds
    Rezgui, Abdelmounaam
    Malik, Zaki
    Xia, Jizhe
    Liu, Kai
    Yang, Chaowei
    2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 2615 - 2618
  • [37] Efficient Geo-Distributed Data Processing with Rout
    Jayalath, Chamikara
    Eugster, Patrick
    2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 470 - 480
  • [38] Minimizing latency in geo-distributed clouds
    Marzieh Malekimajd
    Ali Movaghar
    Seyedmahyar Hosseinimotlagh
    The Journal of Supercomputing, 2015, 71 : 4423 - 4445
  • [39] Minimizing latency in geo-distributed clouds
    Malekimajd, Marzieh
    Movaghar, Ali
    Hosseinimotlagh, Seyedmahyar
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (12): : 4423 - 4445
  • [40] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 421 - 434