Evaluating Overfit and Underfit in Models of Network Community Structure
被引:92
|
作者:
Ghasemian, Amir
论文数: 0引用数: 0
h-index: 0
机构:
Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USAUniv Colorado, Dept Comp Sci, Boulder, CO 80309 USA
Ghasemian, Amir
[1
]
Hosseinmardi, Homa
论文数: 0引用数: 0
h-index: 0
机构:
Univ Southern Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USAUniv Colorado, Dept Comp Sci, Boulder, CO 80309 USA
Hosseinmardi, Homa
[2
]
Clauset, Aaron
论文数: 0引用数: 0
h-index: 0
机构:
Univ Colorado, Boulder, CO 80309 USA
Univ Colorado, Biofrontiers Inst, Boulder, CO 80305 USA
Santa Fe Inst, 1399 Hyde Pk Rd, Santa Fe, NM 87501 USAUniv Colorado, Dept Comp Sci, Boulder, CO 80309 USA
Clauset, Aaron
[3
,4
,5
]
机构:
[1] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[2] Univ Southern Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[3] Univ Colorado, Boulder, CO 80309 USA
[4] Univ Colorado, Biofrontiers Inst, Boulder, CO 80305 USA
[5] Santa Fe Inst, 1399 Hyde Pk Rd, Santa Fe, NM 87501 USA
Task analysis;
Clustering algorithms;
Probabilistic logic;
Bayes methods;
Detection algorithms;
Prediction algorithms;
Partitioning algorithms;
Community detection;
model selection;
overfitting;
underfitting;
link prediction;
link description;
STOCHASTIC BLOCKMODELS;
COMPLEX NETWORKS;
PREDICTION;
RESOLUTION;
SELECTION;
D O I:
10.1109/TKDE.2019.2911585
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
A common graph mining task is community detection, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detection's No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. Here, we present a broad investigation of over and underfitting across 16 state-of-the-art community detection algorithms applied to a novel benchmark corpus of 572 structurally diverse real-world networks. We find that (i) algorithms vary widely in the number and composition of communities they find, given the same input; (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks; (iii) algorithmic differences induce wide variation in accuracy on link-based learning tasks; and, (iv) no algorithm is always the best at such tasks across all inputs. Finally, we quantify each algorithm's overall tendency to over or underfit to network data using a theoretically principled diagnostic, and discuss the implications for future advances in community detection.
机构:
Chongqing Univ, Sch Automat, Chongqing 400044, Peoples R ChinaChongqing Univ, State Key Lab Power Transmiss Equipment & Syst Se, Chongqing 400044, Peoples R China
Yuan Chao
Chai Yi
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ, State Key Lab Power Transmiss Equipment & Syst Se, Chongqing 400044, Peoples R China
Chongqing Univ, Sch Automat, Chongqing 400044, Peoples R ChinaChongqing Univ, State Key Lab Power Transmiss Equipment & Syst Se, Chongqing 400044, Peoples R China
Chai Yi
Wei Shan-Bi
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ, Sch Automat, Chongqing 400044, Peoples R ChinaChongqing Univ, State Key Lab Power Transmiss Equipment & Syst Se, Chongqing 400044, Peoples R China
机构:
Univ Fed Rio Grande do Sul UFRGS, Dept Phys, Inst Fis, Av Bento Goncalves 9500,Caixa Postal 15051, BR-91501970 Porto Alegre, RS, BrazilUniv Fed Rio Grande do Sul UFRGS, Dept Phys, Inst Fis, Av Bento Goncalves 9500,Caixa Postal 15051, BR-91501970 Porto Alegre, RS, Brazil
Gamermann, Daniel
Pellizzaro, Jose Antonio
论文数: 0引用数: 0
h-index: 0
机构:
Univ Fed Rio Grande do Sul UFRGS, Dept Phys, Inst Fis, Av Bento Goncalves 9500,Caixa Postal 15051, BR-91501970 Porto Alegre, RS, BrazilUniv Fed Rio Grande do Sul UFRGS, Dept Phys, Inst Fis, Av Bento Goncalves 9500,Caixa Postal 15051, BR-91501970 Porto Alegre, RS, Brazil