Jaccard-constrained dense subgraph discovery

被引:0
作者
Arachchi, Chamalee Wickrama [1 ]
Tatti, Nikolaj [1 ]
机构
[1] Univ Helsinki, HIIT, Helsinki, Finland
关键词
Dense subgraphs; Jaccard index; Jaccard-constrained; Jaccard-weighted; K-SUBGRAPH; APPROXIMATION; HARD;
D O I
10.1007/s10994-024-06595-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Finding dense subgraphs is a core problem in graph mining with many applications in diverse domains. At the same time many real-world networks vary over time, that is, the dataset can be represented as a sequence of graph snapshots. Hence, it is natural to consider the question of finding dense subgraphs in a temporal network that are allowed to vary over time to a certain degree. In this paper, we search for dense subgraphs that have large pairwise Jaccard similarity coefficients. More formally, given a set of graph snapshots and input parameter alpha\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}, we find a collection of dense subgraphs, with pairwise Jaccard index at least alpha\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}, such that the sum of densities of the induced subgraphs is maximized. We prove that this problem is NP-hard and we present a greedy, iterative algorithm which runs in Onk2+m\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}} \mathopen {} \left( nk<^>2 + m\right)$$\end{document} time per single iteration, where k is the length of the graph sequence and n and m denote number of vertices and total number of edges respectively. We also consider an alternative problem where subgraphs with large pairwise Jaccard indices are rewarded. We do this by incorporating the indices directly into the objective function. More formally, given a set of graph snapshots and a weight lambda\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda$$\end{document}, we find a collection of dense subgraphs such that the sum of densities of the induced subgraphs plus the sum of Jaccard indices, weighted by lambda\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda$$\end{document}, is maximized. We prove that this problem is NP-hard. To discover dense subgraphs with good objective value, we present an iterative algorithm which runs in On2k2+mlogn+k3n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}} \mathopen {}\left( n<^>2k<^>2 + m \log n + k<^>3 n\right)$$\end{document} time per single iteration, and a greedy algorithm which runs in On2k2+mlogn+k3n\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {O}} \mathopen {}\left( n<^>2k<^>2 + m \log n + k<^>3 n\right)$$\end{document} time. We show experimentally that our algorithms are efficient, they can find ground truth in synthetic datasets and provide good results from real-world datasets. Finally, we present two case studies that show the usefulness of our problem.
引用
收藏
页码:7103 / 7125
页数:23
相关论文
共 26 条
[1]  
Andersen R, 2009, LECT NOTES COMPUT SC, V5427, P25, DOI 10.1007/978-3-540-95995-3_3
[2]   Finding Subgraphs with Maximum Total Density and Limited Overlap [J].
Balalau, Oana Denisa ;
Bonchi, Francesco ;
Chany, T-H. Hubert ;
Gullo, Francesco ;
Sozioz, Mauro .
WSDM'15: PROCEEDINGS OF THE EIGHTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2015, :379-388
[3]  
Bhaskara A, 2010, ACM S THEORY COMPUT, P201
[4]  
Charikar M., 2018, ARXIV, DOI DOI 10.48550/ARXIV.1802.06361
[5]  
Charikar M., 2000, APPROX VOLUME 1913 L, V1913, P84, DOI 10.1007/3-540-44436-X10
[6]  
Chlebík M, 2003, LECT NOTES COMPUT SC, V2653, P152
[7]  
Du XX, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P1135
[8]   The dense k-subgraph problem [J].
Feige, U ;
Kortsarz, G ;
Peleg, D .
ALGORITHMICA, 2001, 29 (03) :410-421
[9]   MotifCut: regulatory motifs finding with maximum density subgraphs [J].
Fratkin, Eugene ;
Naughton, Brian T. ;
Brutlag, Douglas L. ;
Batzoglou, Serafim .
BIOINFORMATICS, 2006, 22 (14) :E150-E157
[10]   Top-k overlapping densest subgraphs [J].
Galbrun, Esther ;
Gionis, Aristides ;
Tatti, Nikolaj .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (05) :1134-1165