Counting triangles in real-world networks using projections

被引:18
作者
Tsourakakis, Charalampos E. [1 ]
机构
[1] Carnegie Mellon Univ, Machine Learning Dept, Sch Comp Sci, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
Triangles; Network analysis; SVD; Algorithms;
D O I
10.1007/s10115-010-0291-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Triangle counting is an important problem in graph mining. Two frequently used metrics in complex network analysis that require the count of triangles are the clustering coefficients and the transitivity ratio of the graph. Triangles have been used successfully in several real-world applications, such as detection of spamming activity, uncovering the hidden thematic structure of the web and link recommendation in online social networks. Furthermore, the count of triangles is a frequently used network statistic in exponential random graph models. However, counting the number of triangles in a graph is computationally expensive. In this paper, we propose the EigenTriangle and EigenTriangleLocal algorithms to estimate the number of triangles in a graph. The efficiency of our algorithms is based on the special spectral properties of real-world networks, which allow us to approximate accurately the number of triangles. We verify the efficacy of our method experimentally in almost 160 experiments using several Web Graphs, social, co-authorship, information, and Internet networks where we obtain significant speedups with respect to a straightforward triangle counting algorithm. Furthermore, we propose an algorithm based on Fast SVD which allows us to apply the core idea of the EigenTriangle algorithm on graphs which do not fit in the main memory. The main idea is a simple node-sampling process according to which node i is selected with probability d(i)/2m where d (i) is the degree of node i and m is the total number of edges in the graph. Our theoretical contributions also include a theorem that gives a closed formula for the number of triangles in Kronecker graphs, a model of networks which mimics several properties of real-world networks.
引用
收藏
页码:501 / 520
页数:20
相关论文
共 48 条
[11]  
Bollobas B., 2001, RANDOM GRAPHS, DOI 10.1017/CBO9780511814068
[12]  
BRODER AZ, 1998, S THEOR COMP
[13]  
BURIOL L, 2006, PRINCE DATABASE SYST
[14]   Power-Law Distributions in Empirical Data [J].
Clauset, Aaron ;
Shalizi, Cosma Rohilla ;
Newman, M. E. J. .
SIAM REVIEW, 2009, 51 (04) :661-703
[15]  
COPPERSMITH D, 1987, S THEOR COMP
[16]  
CUPPEN JJM, 1981, NUMER MATH, V36, P177, DOI 10.1007/BF01396757
[17]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[18]  
2-9
[19]   Clustering large graphs via the Singular Value Decomposition [J].
Drineas, P ;
Frieze, A ;
Kannan, R ;
Vempala, S ;
Vinay, V .
MACHINE LEARNING, 2004, 56 (1-3) :9-33
[20]   Curvature of co-links uncovers hidden thematic layers in the World Wide Web [J].
Eckmann, JP ;
Moses, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (09) :5825-5829