Scalable probabilistic truss decomposition using central limit theorem and H-index

被引：2

作者：

Esfahani, Fatemeh ^{[1
]}

Daneshmand, Mahsa ^{[1
]}

Srinivasan, Venkatesh ^{[1
]}

Thomo, Alex ^{[1
]}

Wu, Kui ^{[1
]}

机构：

[1] Univ Victoria, Victoria, BC, Canada

来源：

DISTRIBUTED AND PARALLEL DATABASES | 2022年 / 40卷 / 2-3期

关键词：

Truss decomposition; Dense subgraphs; Probabilistic graphs;

D O I：

10.1007/s10619-022-07415-9

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Truss decomposition is a popular notion of hierarchical dense substructures in graphs. In a nutshell, k-truss is the largest subgraph in which every edge is contained in at least k triangles. Truss decomposition aims to compute k-trusses for each possible value of k. There are many works that study truss decomposition in deterministic graphs. However, in probabilistic graphs, truss decomposition is significantly more challenging and has received much less attention; state-of-the-art approaches do not scale well to large probabilistic graphs. Finding the tail probabilities of the number of triangles that contain each edge is a critical challenge of those approaches. This is achieved using dynamic programming which has quadratic run-time and thus not scalable to real large networks which, quite commonly, can have edges contained in many triangles (in the millions). To address this challenge, we employ a special version of the Central Limit Theorem (CLT) to obtain the tail probabilities efficiently. Based on our CLT approach we propose a peeling algorithm for truss decomposition that scales to large probabilistic graphs and offers significant improvement over state-of-the-art. We also design a second method which progressively tightens the estimate of the truss value of each edge and is based on h-index computation. In contrast to our CLT-based approach, our h-index algorithm (1) is progressive by allowing the user to see near-results along the way, (2) does not sacrifice the exactness of final result, and (3) achieves all these while processing only one edge and its immediate neighbors at a time, thus resulting in smaller memory footprint. We perform extensive experiments to show the scalability of both of our proposed algorithms.

引用

页码：299 / 333

页数：35

共 33 条

[1] The accuracy of the Gaussian approximation to the sum of independent variates
Berry, Andrew C.
[J]. TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1941, 49 (1-3) : 122 - 136
[2] Boldi P., 2004, P 13 INT C WORLD WID, P595, DOI DOI 10.1145/988672.988752
[3] Boldi P., 2011, P 20 INT C WORLD WID, P587
[4] Core Decomposition of Uncertain Graphs
Bonchi, Francesco
Gullo, Francesco
Kaltenbrunner, Andreas
Volkovich, Yana
[J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1316 - 1325
[5] The Global Phosphorylation Landscape of SARS-CoV-2 Infection
Bouhaddou, Mehdi
Memon, Danish
Meyer, Bjoern
White, Kris M.
Rezelj, Veronica V.
Marrero, Miguel Correa
Polacco, Benjamin J.
Melnyk, James E.
Ulferts, Svenja
Kaake, Robyn M.
Batra, Jyoti
Richards, Alicia L.
Stevenson, Erica
Gordon, David E.
Rojc, Ajda
Obernier, Kirsten
Fabius, Jacqueline M.
Soucheray, Margaret
Miorin, Lisa
Moreno, Elena
Koh, Cassandra
Quang Dinh Tran
Hardy, Alexandra
Robinot, Remy
Vallet, Thomas
Nilsson-Payant, Benjamin E.
Hernandez-Armenta, Claudia
Dunham, Alistair
Weigang, Sebastian
Knerr, Julian
Modak, Maya
Quintero, Diego
Zhou, Yuan
Dugourd, Aurelien
Valdeolivas, Alberto
Patil, Trupti
Li, Qiongyu
Huttenhain, Ruth
Cakir, Merve
Muralidharan, Monita
Kim, Minkyu
Jang, Gwendolyn
Tutuncuoglu, Beril
Hiatt, Joseph
Guo, Jeffrey Z.
Xu, Jiewei
Bouhaddou, Sophia
Mathy, Christopher J. P.
Gaulton, Anna
Manners, Emma J.
[J]. CELL, 2020, 182 (03) : 685 - +
[6] Accelerating Truss Decomposition on Heterogeneous Processors
Che, Yulin
Lai, Zhuohang
Sun, Shixuan
Wang, Yue
Luo, Qiong
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (10): : 1751 - 1764
[7] Chen Lei, 2012, SYNTH LECT DATA MANA, V4, P1, DOI [DOI 10.1007/978-3-031-01896-1, DOI 10.1007/S11628-012-0160-Z]
[8] Cramer H., 1946, Mathematical methods of statistics.
[9] Host factor prioritization for pan-viral genetic perturbation screens using random intercept models and network propagation
Dirmeier, Simon
Daechert, Christopher
van Hemert, Martijn
Tas, Ali
Ogando, Natacha S.
van Kuppeveld, Frank
Bartenschlager, Ralf
Kaderali, Lars
Binder, Marco
Beerenwinkel, Niko
[J]. PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (02)
[10] Guo Y., 2021, BIORXIV, DOI [10.1101/2021.06.23.449535v1.full.pdf, DOI 10.1101/2021.06.23.449535V1.FULL.PDF]

← 1 2 3 4 →