Efficient Graph Query Processing over Geo-Distributed Datacenters

被引:8
作者
Yuan, Ye [1 ]
Ma, Delong [2 ]
Wen, Zhenyu [3 ]
Ma, Yuliang [2 ]
Wang, Guoren [1 ]
Chen, Lei [4 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Northeastern Univ, Shenyang, Peoples R China
[3] Newcastle Univ, Newcastle Upon Tyne, Tyne & Wear, England
[4] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) | 2020年
关键词
Graph search; Geo-distributed; Datacenters; MAPREDUCE;
D O I
10.1145/3397271.3401157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph queries have emerged as one of the fundamental techniques to support modern search services, such as PageRank web search, social networking search and knowledge graph search. As such graphs are maintained globally and very huge (e.g., billions of nodes), we need to efficiently process graph queries across multiple geographically distributed datacenters, running geo-distributed graph queries. Existing graph computing frameworks may not work well for geographically distributed datacenters, because they implement a Bulk Synchronous Parallel model that requires excessive inter-datacenter transfers, thereby introducing extremely large latency for query processing. In this paper, we propose GeoGraph-a universal framework to support efficient geo-distributed graph query processing based on clustering datacenters and meta-graph, while reducing the inter-datacenter communication. Our new framework can be applied to many types of graph algorithms without any modification. The framework is developed on the top of Apache Giraph. The experiments were conducted by applying four important graph queries, i.e., shortest path, graph keyword search, subgraph isomorphism and PageRank. The evaluation results show that our proposed framework can achieve up to 82% faster convergence, 42% lower WAN bandwidth usage, and 45% less total monetary cost for the four graph queries, with input graphs stored across ten geo-distributed datacenters.
引用
收藏
页码:619 / 628
页数:10
相关论文
共 50 条
  • [41] Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers
    Jinghui Zhang
    Jian Chen
    Junzhou Luo
    Aibo Song
    Tsinghua Science and Technology, 2016, 21 (05) : 471 - 481
  • [42] Trading Cost and Throughput in Geo-Distributed Analytics With A Two Time Scale Approach
    Xu, Xinping
    Li, Wenxin
    Xu, Renhai
    Qi, Heng
    Li, Keqiu
    Zhou, Xiaobo
    Chen, Sheng
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2022, 10 (03) : 2163 - 2177
  • [43] Good Shepherds Care For Their Cattle: Seamless Pod Migration in Geo-Distributed Kubernetes
    Souza, Paulo, Jr.
    Miorandi, Daniele
    Pierre, Guillaume
    6TH IEEE INTERNATIONAL CONFERENCE ON FOG AND EDGE COMPUTING (ICFEC 2022), 2022, : 26 - 33
  • [44] GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers
    Convolbo, Moise W.
    Chou, Jerry
    Hsu, Ching-Hsien
    Chung, Yeh Ching
    COMPUTING, 2018, 100 (01) : 21 - 46
  • [45] An Instance Reservation Framework for Cost Effective Services in Geo-Distributed Data Centers
    Liu, Kaiyang
    Peng, Jun
    Yu, Boyang
    Liu, Weirong
    Huang, Zhiwu
    Pan, Jianping
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2021, 14 (02) : 356 - 370
  • [46] Multi-job Hadoop scheduling to process Geo-distributed big data
    Cavallo, Marco
    Di Modica, Giuseppe
    Polito, Carmelo
    Tomarchio, Orazio
    2017 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2017, : 1175 - 1181
  • [47] Intelligent Virtual Machine Placement for Cost Efficiency in Geo-Distributed Cloud Systems
    Chen, Kuan-yin
    Xu, Yang
    Xi, Kang
    Chao, H. Jonathan
    2013 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2013, : 3498 - 3503
  • [48] Energy-Aware Cloud Workflow Applications Scheduling With Geo-Distributed Data
    Li, Xiaoping
    Yu, Wei
    Ruiz, Ruben
    Zhu, Jie
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (02) : 891 - 903
  • [49] Workload Based Geo-Distributed Data Center Planning in Fast Developing Economies
    Liu, Ruiyun
    Sun, Weiqiang
    Hu, Weisheng
    IEEE ACCESS, 2020, 8 (224269-224282): : 224269 - 224282
  • [50] DRASH: A Data Replication-Aware Scheduler in Geo-distributed Data Centers
    Convolbo, Moise W.
    Chou, Jerry
    Lu, Shihyu
    Chung, Yeh Ching
    2016 8TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2016), 2016, : 302 - 309