Efficient Graph Query Processing over Geo-Distributed Datacenters

被引:8
作者
Yuan, Ye [1 ]
Ma, Delong [2 ]
Wen, Zhenyu [3 ]
Ma, Yuliang [2 ]
Wang, Guoren [1 ]
Chen, Lei [4 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Northeastern Univ, Shenyang, Peoples R China
[3] Newcastle Univ, Newcastle Upon Tyne, Tyne & Wear, England
[4] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) | 2020年
关键词
Graph search; Geo-distributed; Datacenters; MAPREDUCE;
D O I
10.1145/3397271.3401157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph queries have emerged as one of the fundamental techniques to support modern search services, such as PageRank web search, social networking search and knowledge graph search. As such graphs are maintained globally and very huge (e.g., billions of nodes), we need to efficiently process graph queries across multiple geographically distributed datacenters, running geo-distributed graph queries. Existing graph computing frameworks may not work well for geographically distributed datacenters, because they implement a Bulk Synchronous Parallel model that requires excessive inter-datacenter transfers, thereby introducing extremely large latency for query processing. In this paper, we propose GeoGraph-a universal framework to support efficient geo-distributed graph query processing based on clustering datacenters and meta-graph, while reducing the inter-datacenter communication. Our new framework can be applied to many types of graph algorithms without any modification. The framework is developed on the top of Apache Giraph. The experiments were conducted by applying four important graph queries, i.e., shortest path, graph keyword search, subgraph isomorphism and PageRank. The evaluation results show that our proposed framework can achieve up to 82% faster convergence, 42% lower WAN bandwidth usage, and 45% less total monetary cost for the four graph queries, with input graphs stored across ten geo-distributed datacenters.
引用
收藏
页码:619 / 628
页数:10
相关论文
共 50 条
  • [21] A Performance Study of Geo-Distributed IoT Data Aggregation for Fog Computing
    Imai, Shigeru
    Varela, Carlos A.
    Patterson, Stacy
    2018 IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING COMPANION (UCC COMPANION), 2018, : 278 - 283
  • [22] Investigation of Network Traffic in Geo-Distributed Data Centers
    Koshiba, Yutaka
    Chen, Wuhui
    Yamada, Yuichi
    Tanaka, Takazumi
    Paik, Incheon
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON AWARENESS SCIENCE & TECHNOLOGY (ICAST), 2015, : 174 - 179
  • [23] Holistic Management of Sustainable Geo-Distributed Data Centers
    Abbasi, Zahra
    Gupta, Sandeep K. S.
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 426 - 435
  • [24] GDSim: Benchmarking Geo-Distributed Data Center Schedulers
    Alves, Daniel
    Obraczka, Katia
    Kabbani, Abdul
    2021 IEEE 10TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING (IEEE CLOUDNET), 2021, : 148 - 156
  • [25] Temperature Aware Workload Management in Geo-Distributed Data Centers
    Xu, Hong
    Feng, Chen
    Li, Baochun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (06) : 1743 - 1753
  • [26] MapReduce Task Scheduling in Heterogeneous Geo-Distributed Data Centers
    Li, Xiaoping
    Chen, Fuchao
    Ruiz, Ruben
    Zhu, Jie
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3317 - 3329
  • [27] A LAHC-based Job Scheduling Strategy to Improve Big Data Processing in Geo-distributed Contexts
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    IOTBDS: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY, 2017, : 92 - 101
  • [28] H2F: a Hierarchical Hadoop Framework for big data processing in geo-distributed environments
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    2016 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT), 2016, : 27 - 35
  • [29] Cost-efficient resource allocation algorithm for scientific workflow accross geo-distributed data centers
    Wei X.-H.
    Tang F.-Y.
    Li H.-L.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2021, 51 (04): : 1349 - 1357
  • [30] A MapReduce Cluster Deployment Optimization Framework with Geo-distributed Data
    Li, Shanshan
    Lu, Qinghua
    Zhang, Weishan
    Zhu, Liming
    IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS, 2015, : 943 - 949