OPT: A New Framework for Overlapped and Parallel Triangulation in Large-scale Graphs

被引:33
|
作者
Kim, Jinha [1 ]
Han, Wook-Shin [1 ]
Lee, Sangyeon [1 ]
Park, Kyungyeol [1 ]
Yu, Hwanjo [1 ]
机构
[1] Pohang Univ Sci & Technol POSTECH, Pohang, South Korea
来源
SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA | 2014年
基金
新加坡国家研究基金会;
关键词
Triangulation; Big data; Parallel processing;
D O I
10.1145/2588555.2588563
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph triangulation, which finds all triangles in a graph, has been actively studied due to its wide range of applications in the network analysis and data mining. With the rapid growth of graph data size, disk-based triangulation methods are in demand but little researched. To handle a large-scale graph which does not fit in memory, we must iteratively load small parts of the graph. In the existing literature, achieving the ideal cost has been considered to be impossible for billion-scale graphs due to the memory size constraint. In this paper, we propose an overlapped and parallel disk-based triangulation framework for billion-scale graphs, OPT, which achieves the ideal cost by (1) full overlap of the CPU and :1/0 operations and (2) full parallelism of multi-core CPU and FlashSSD I/O. In OPT, triangles in memory are called the internal triangles while triangles constituting vertices in memory and vertices in external memory are called the external triangles. At the macro level, OPT overlaps the internal triangulation and the external triangulation, while it overlaps the CPU and I/O operations at the micro level. Thereby, the cost of OPT is close to the ideal cost. Moreover, OPT instantiates both vertex-iterator and edge-iterator models and benefits from multi-thread parallelism on both types of triangulation. Extensive experiments conducted on large-scale datasets showed that (1) OPT achieved the elapsed time close to that of the ideal method with less than 7% of overhead under the limited memory budget, (2) OPT achieved linear speed-up with an increasing number of CPU cores, (3) OPT outperforms the state-ofthe-art parallel method by up to an order of magnitude with 6 CPU cores, and (4) for the first time in the literature, the triangulation results are reported for a billion-vertex scale real-world graph.
引用
收藏
页码:637 / 648
页数:12
相关论文
共 50 条
  • [1] TIPP: Parallel Delaunay Triangulation for Large-Scale Datasets
    Nguyen, Cuong
    Rhodes, Philip J.
    30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [2] Parallel generation of large-scale random graphs
    Vullikanti, Anil
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 278 - 278
  • [3] NEW OVERLAPPED DECENTRALIZED CONTROLLER FOR LARGE-SCALE SYSTEMS
    HASSAN, MF
    BAHNASAWI, AA
    RAGAB, RH
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 1987, 18 (10) : 1963 - 1977
  • [4] New overlapped decentralized controller for large-scale systems
    Hassan, Mohamed F.
    Bahnasawi, Ahmed A.
    Ragab, Rashad H.
    1963, Taylor and Francis Ltd. (18):
  • [5] Large-scale parallel simulation with implicit conservative overlapped grid for flow
    Wang, Ziwei
    Fan, Zhaolin
    Liu, Fan
    Jiang, Xiong
    Li, Bin
    Qiu, Ming
    COMPUTERS & FLUIDS, 2022, 238
  • [6] A new parallel framework algorithm for solving large-scale DEA models
    Muren
    Ma, Zhanxin
    Li, Hao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [7] A framework for parallel large-scale global optimization
    Evtushenko, Yuri
    Posypkin, Mikhail
    Sigal, Israel
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2009, 23 (3-4): : 211 - 215
  • [8] Parallel Framework for Dimensionality Reduction of Large-Scale Datasets
    Samudrala, Sai Kiranmayee
    Zola, Jaroslaw
    Aluru, Srinivas
    Ganapathysubramanian, Baskar
    SCIENTIFIC PROGRAMMING, 2015, 2015
  • [9] Hub Labels on the database for large-scale graphs with the COLD framework
    Efentakis, Alexandros
    Efstathiades, Christodoulos
    Pfoser, Dieter
    GEOINFORMATICA, 2017, 21 (04) : 703 - 732
  • [10] A Dynamic Programming Framework for Large-Scale Online Clustering on Graphs
    Li, Yantao
    Zhao, Xiang
    Qu, Zehui
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1613 - 1629