Scalable probabilistic truss decomposition using central limit theorem and H-index

被引:2
作者
Esfahani, Fatemeh [1 ]
Daneshmand, Mahsa [1 ]
Srinivasan, Venkatesh [1 ]
Thomo, Alex [1 ]
Wu, Kui [1 ]
机构
[1] Univ Victoria, Victoria, BC, Canada
关键词
Truss decomposition; Dense subgraphs; Probabilistic graphs;
D O I
10.1007/s10619-022-07415-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Truss decomposition is a popular notion of hierarchical dense substructures in graphs. In a nutshell, k-truss is the largest subgraph in which every edge is contained in at least k triangles. Truss decomposition aims to compute k-trusses for each possible value of k. There are many works that study truss decomposition in deterministic graphs. However, in probabilistic graphs, truss decomposition is significantly more challenging and has received much less attention; state-of-the-art approaches do not scale well to large probabilistic graphs. Finding the tail probabilities of the number of triangles that contain each edge is a critical challenge of those approaches. This is achieved using dynamic programming which has quadratic run-time and thus not scalable to real large networks which, quite commonly, can have edges contained in many triangles (in the millions). To address this challenge, we employ a special version of the Central Limit Theorem (CLT) to obtain the tail probabilities efficiently. Based on our CLT approach we propose a peeling algorithm for truss decomposition that scales to large probabilistic graphs and offers significant improvement over state-of-the-art. We also design a second method which progressively tightens the estimate of the truss value of each edge and is based on h-index computation. In contrast to our CLT-based approach, our h-index algorithm (1) is progressive by allowing the user to see near-results along the way, (2) does not sacrifice the exactness of final result, and (3) achieves all these while processing only one edge and its immediate neighbors at a time, thus resulting in smaller memory footprint. We perform extensive experiments to show the scalability of both of our proposed algorithms.
引用
收藏
页码:299 / 333
页数:35
相关论文
共 33 条
  • [1] The accuracy of the Gaussian approximation to the sum of independent variates
    Berry, Andrew C.
    [J]. TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1941, 49 (1-3) : 122 - 136
  • [2] Boldi P., 2004, P 13 INT C WORLD WID, P595, DOI DOI 10.1145/988672.988752
  • [3] Boldi P., 2011, P 20 INT C WORLD WID, P587
  • [4] Core Decomposition of Uncertain Graphs
    Bonchi, Francesco
    Gullo, Francesco
    Kaltenbrunner, Andreas
    Volkovich, Yana
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1316 - 1325
  • [5] The Global Phosphorylation Landscape of SARS-CoV-2 Infection
    Bouhaddou, Mehdi
    Memon, Danish
    Meyer, Bjoern
    White, Kris M.
    Rezelj, Veronica V.
    Marrero, Miguel Correa
    Polacco, Benjamin J.
    Melnyk, James E.
    Ulferts, Svenja
    Kaake, Robyn M.
    Batra, Jyoti
    Richards, Alicia L.
    Stevenson, Erica
    Gordon, David E.
    Rojc, Ajda
    Obernier, Kirsten
    Fabius, Jacqueline M.
    Soucheray, Margaret
    Miorin, Lisa
    Moreno, Elena
    Koh, Cassandra
    Quang Dinh Tran
    Hardy, Alexandra
    Robinot, Remy
    Vallet, Thomas
    Nilsson-Payant, Benjamin E.
    Hernandez-Armenta, Claudia
    Dunham, Alistair
    Weigang, Sebastian
    Knerr, Julian
    Modak, Maya
    Quintero, Diego
    Zhou, Yuan
    Dugourd, Aurelien
    Valdeolivas, Alberto
    Patil, Trupti
    Li, Qiongyu
    Huttenhain, Ruth
    Cakir, Merve
    Muralidharan, Monita
    Kim, Minkyu
    Jang, Gwendolyn
    Tutuncuoglu, Beril
    Hiatt, Joseph
    Guo, Jeffrey Z.
    Xu, Jiewei
    Bouhaddou, Sophia
    Mathy, Christopher J. P.
    Gaulton, Anna
    Manners, Emma J.
    [J]. CELL, 2020, 182 (03) : 685 - +
  • [6] Accelerating Truss Decomposition on Heterogeneous Processors
    Che, Yulin
    Lai, Zhuohang
    Sun, Shixuan
    Wang, Yue
    Luo, Qiong
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (10): : 1751 - 1764
  • [7] Chen Lei, 2012, SYNTH LECT DATA MANA, V4, P1, DOI [DOI 10.1007/978-3-031-01896-1, DOI 10.1007/S11628-012-0160-Z]
  • [8] Cramer H., 1946, Mathematical methods of statistics.
  • [9] Host factor prioritization for pan-viral genetic perturbation screens using random intercept models and network propagation
    Dirmeier, Simon
    Daechert, Christopher
    van Hemert, Martijn
    Tas, Ali
    Ogando, Natacha S.
    van Kuppeveld, Frank
    Bartenschlager, Ralf
    Kaderali, Lars
    Binder, Marco
    Beerenwinkel, Niko
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (02)
  • [10] Guo Y., 2021, BIORXIV, DOI [10.1101/2021.06.23.449535v1.full.pdf, DOI 10.1101/2021.06.23.449535V1.FULL.PDF]