Efficient Graph Query Processing over Geo-Distributed Datacenters

被引:8
作者
Yuan, Ye [1 ]
Ma, Delong [2 ]
Wen, Zhenyu [3 ]
Ma, Yuliang [2 ]
Wang, Guoren [1 ]
Chen, Lei [4 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Northeastern Univ, Shenyang, Peoples R China
[3] Newcastle Univ, Newcastle Upon Tyne, Tyne & Wear, England
[4] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
来源
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) | 2020年
关键词
Graph search; Geo-distributed; Datacenters; MAPREDUCE;
D O I
10.1145/3397271.3401157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph queries have emerged as one of the fundamental techniques to support modern search services, such as PageRank web search, social networking search and knowledge graph search. As such graphs are maintained globally and very huge (e.g., billions of nodes), we need to efficiently process graph queries across multiple geographically distributed datacenters, running geo-distributed graph queries. Existing graph computing frameworks may not work well for geographically distributed datacenters, because they implement a Bulk Synchronous Parallel model that requires excessive inter-datacenter transfers, thereby introducing extremely large latency for query processing. In this paper, we propose GeoGraph-a universal framework to support efficient geo-distributed graph query processing based on clustering datacenters and meta-graph, while reducing the inter-datacenter communication. Our new framework can be applied to many types of graph algorithms without any modification. The framework is developed on the top of Apache Giraph. The experiments were conducted by applying four important graph queries, i.e., shortest path, graph keyword search, subgraph isomorphism and PageRank. The evaluation results show that our proposed framework can achieve up to 82% faster convergence, 42% lower WAN bandwidth usage, and 45% less total monetary cost for the four graph queries, with input graphs stored across ten geo-distributed datacenters.
引用
收藏
页码:619 / 628
页数:10
相关论文
共 50 条
  • [1] Towards Efficient Graph Processing in Geo-Distributed Data Centers
    Yao, Feng
    Tao, Qian
    Lin, Shengyuan
    Zhang, Yanfeng
    Yu, Wenyuan
    Gong, Shufeng
    Wang, Qiange
    Yu, Ge
    Zhou, Jingren
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (11) : 2147 - 2160
  • [2] Cost-Aware Big Data Processing Across Geo-Distributed Datacenters
    Xiao, Wenhua
    Bao, Weidong
    Zhu, Xiaomin
    Liu, Ling
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3114 - 3127
  • [3] Optimizing Geo-Distributed Data Processing with Resource Heterogeneity over the Internet
    Marzuni, Saeed mirpour
    Toosi, Adel
    Savadi, Abdorreza
    Naghibzadeh, Mahmud
    Taniar, David
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2025, 25 (01)
  • [4] CloudSimPer: Simulating Geo-Distributed Datacenters Powered by Renewable Energy Mix
    Song, Jie
    Zhu, Peimeng
    Zhang, Yanfeng
    Yu, Ge
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (04) : 531 - 547
  • [5] GreenBDT: Renewable-aware scheduling of bulk data transfers for geo-distributed sustainable datacenters
    Lu, Xingjian
    Jiang, Dongxu
    He, Gaoqi
    Yu, Huiqun
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2018, 20 : 120 - 129
  • [6] LEO Satellite Networks Assisted Geo-Distributed Data Processing
    Zhao, Zhiyuan
    Chen, Zhe
    Lin, Zheng
    Zhu, Wenjun
    Qiu, Kun
    You, Chaoqun
    Gao, Yue
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2025, 14 (02) : 405 - 409
  • [7] Optimizing Network Transfers for Data Analytic Jobs Across Geo-Distributed Datacenters
    Chen, Li
    Liu, Shuhao
    Li, Baochun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (02) : 403 - 414
  • [8] Harmony: An Approach for Geo-distributed Processing of Big-Data Applications
    Zhang, Han
    Ramapantulu, Lavanya
    Teo, Yong Meng
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 160 - 170
  • [9] OneEdge: An Efficient Control Plane for Geo-Distributed Infrastructures
    Saurez, Enrique
    Gupta, Harshit
    Daglis, Alexandros
    Ramachandran, Umakishore
    PROCEEDINGS OF THE 2021 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '21), 2021, : 182 - 196
  • [10] Efficient Distributed Query Processing on Large Scale RDF Graph Data
    Wang X.
    Xu Q.
    Chai L.-L.
    Yang Y.-J.
    Chai Y.-P.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (03): : 498 - 514