Mining Top-k Pairs of Correlated Subgraphs in a Large Network

被引:10
作者
Prateek, Arneish [1 ]
Khan, Arijit [2 ]
Goyal, Akshit [1 ]
Ranu, Sayan [1 ]
机构
[1] Indian Inst Technol, Delhi, India
[2] Nanyang Technol Univ, Singapore, Singapore
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2020年 / 13卷 / 09期
关键词
TOOL; ALIGNMENT;
D O I
10.14778/3397230.3397245
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
yWe investigate the problem of correlated subgraphs mining (CSM) where the goal is to identify pairs of subgraph patterns that frequently co-occur in proximity within a single graph. Correlated subgraph patterns are different from frequent subgraphs due to the flexibility in connections between constituent subgraph instances and thus, existing frequent subgraphs mining algorithms cannot be directly applied for CSM. Moreover, computing the degree of correlation between two patterns requires enumerating and finding distances between every pair of subgraph instances of both patterns - a task that is both memory-intensive as well as computationally demanding. To this end, we propose two holistic best-first exploration algorithms: CSM-E (an exact method) and CSM-A (a more efficient approximate method with near-optimal quality). To further improve efficiency, we propose a top-k pruning strategy, while to reduce memory footprint, we develop a compressed data structure called Replica, which stores all instances of a subgraph pattern on demand. Our empirical results demonstrate that the proposed algorithms not only mine interesting correlations, but also achieve good scalability over large networks.
引用
收藏
页码:1511 / 1524
页数:14
相关论文
共 58 条
  • [1] Al Hasan M., 2009, Proc. VLDB, V2, P730, DOI DOI 10.14778/1687627.1687710
  • [2] [Anonymous], 1995, FOCS
  • [3] [Anonymous], ICDE
  • [4] [Anonymous], 1994, VLDB
  • [5] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [6] Borgelt C., 2002, ICDM
  • [7] Bringmann B., 2008, PAKDD
  • [8] Challenging the Time Complexity of Exact Subgraph Isomorphism for Huge and Dense Graphs with VF3
    Carletti, Vincenzo
    Foggia, Pasquale
    Saggese, Alessia
    Vento, Mario
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 804 - 818
  • [9] Chen Chen., 2009, PVLDB, V2, P742, DOI DOI 10.14778/1687627.1687711
  • [10] Substructure Discovery Using Minimum Description Length and Background Knowledge
    Cook, Diane J.
    Holder, Lawrence B.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1993, 1 : 231 - 255