Evaluating Overfit and Underfit in Models of Network Community Structure

被引:92
|
作者
Ghasemian, Amir [1 ]
Hosseinmardi, Homa [2 ]
Clauset, Aaron [3 ,4 ,5 ]
机构
[1] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[2] Univ Southern Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[3] Univ Colorado, Boulder, CO 80309 USA
[4] Univ Colorado, Biofrontiers Inst, Boulder, CO 80305 USA
[5] Santa Fe Inst, 1399 Hyde Pk Rd, Santa Fe, NM 87501 USA
基金
美国国家科学基金会;
关键词
Task analysis; Clustering algorithms; Probabilistic logic; Bayes methods; Detection algorithms; Prediction algorithms; Partitioning algorithms; Community detection; model selection; overfitting; underfitting; link prediction; link description; STOCHASTIC BLOCKMODELS; COMPLEX NETWORKS; PREDICTION; RESOLUTION; SELECTION;
D O I
10.1109/TKDE.2019.2911585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common graph mining task is community detection, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detection's No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. Here, we present a broad investigation of over and underfitting across 16 state-of-the-art community detection algorithms applied to a novel benchmark corpus of 572 structurally diverse real-world networks. We find that (i) algorithms vary widely in the number and composition of communities they find, given the same input; (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks; (iii) algorithmic differences induce wide variation in accuracy on link-based learning tasks; and, (iv) no algorithm is always the best at such tasks across all inputs. Finally, we quantify each algorithm's overall tendency to over or underfit to network data using a theoretically principled diagnostic, and discuss the implications for future advances in community detection.
引用
收藏
页码:1722 / 1735
页数:14
相关论文
共 50 条
  • [41] Hierarchical community structure preserving approach for network embedding
    Duan, Zhen
    Sun, Xian
    Zhao, Shu
    Chen, Jie
    Zhang, Yanping
    Tang, Jie
    INFORMATION SCIENCES, 2021, 546 : 1084 - 1096
  • [42] Network Representation Learning Guided by Partial Community Structure
    Sun, Hanlin
    Jie, Wei
    Wang, Zhongmin
    Wang, Hai
    Ma, Sugang
    IEEE ACCESS, 2020, 8 : 46665 - 46681
  • [43] The community structure identification for the Chinese merger and acquisition network
    Guo, Xin-Yu
    Yang, Kai
    Wu, Xian-Ming
    Guo, Qiang
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 526
  • [44] Network community structure of substorms using SuperMAG magnetometers
    Orr, L.
    Chapman, S. C.
    Gjerloev, J. W.
    Guo, W.
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [45] Enhancing network capacity by weakening community structure in scale-free network
    Cai, Jun
    Wang, Yu
    Liu, Yan
    Luo, Jian-Zhen
    Wei, Wenguo
    Xu, Xiaoping
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 87 : 765 - 771
  • [46] Evaluating and comparing the igraph community detection algorithms
    de Sousa, Fabiano Berardo
    Zhao, Liang
    2014 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2014, : 408 - 413
  • [47] Community structure detection algorithm based on link prediction
    Dai G.
    Wang Q.
    Xu B.
    Sun L.
    International Journal of Information and Communication Technology, 2021, 19 (04) : 432 - 448
  • [48] Detecting community structure using biased random merging
    Liu, Xu
    Forrest, Jeffrey Yi-Lin
    Luo, Qiang
    Yi, Dong-Yun
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2012, 391 (04) : 1797 - 1810
  • [49] An information gain-based approach for evaluating protein structure models
    Postic, Guillaume
    Janel, Nathalie
    Tuffery, Pierre
    Moroy, Gautier
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 : 2228 - 2236
  • [50] Network structure exploration via Bayesian nonparametric models
    Chen, Y.
    Wang, X. L.
    Xiang, X.
    Tang, B. Z.
    Bu, J. Z.
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2015,