Evaluating Overfit and Underfit in Models of Network Community Structure

被引:92
|
作者
Ghasemian, Amir [1 ]
Hosseinmardi, Homa [2 ]
Clauset, Aaron [3 ,4 ,5 ]
机构
[1] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[2] Univ Southern Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[3] Univ Colorado, Boulder, CO 80309 USA
[4] Univ Colorado, Biofrontiers Inst, Boulder, CO 80305 USA
[5] Santa Fe Inst, 1399 Hyde Pk Rd, Santa Fe, NM 87501 USA
基金
美国国家科学基金会;
关键词
Task analysis; Clustering algorithms; Probabilistic logic; Bayes methods; Detection algorithms; Prediction algorithms; Partitioning algorithms; Community detection; model selection; overfitting; underfitting; link prediction; link description; STOCHASTIC BLOCKMODELS; COMPLEX NETWORKS; PREDICTION; RESOLUTION; SELECTION;
D O I
10.1109/TKDE.2019.2911585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common graph mining task is community detection, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detection's No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. Here, we present a broad investigation of over and underfitting across 16 state-of-the-art community detection algorithms applied to a novel benchmark corpus of 572 structurally diverse real-world networks. We find that (i) algorithms vary widely in the number and composition of communities they find, given the same input; (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks; (iii) algorithmic differences induce wide variation in accuracy on link-based learning tasks; and, (iv) no algorithm is always the best at such tasks across all inputs. Finally, we quantify each algorithm's overall tendency to over or underfit to network data using a theoretically principled diagnostic, and discuss the implications for future advances in community detection.
引用
收藏
页码:1722 / 1735
页数:14
相关论文
共 50 条
  • [21] CDPM: Finding and Evaluating Community Structure in Social Networks
    Wan, Li
    Liao, Jianxin
    Zhu, Xiaomin
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 620 - 627
  • [22] Community Detection in Partial Correlation Network Models
    Brownlees, Christian
    Gudmundsson, Gudmundur Stefan
    Lugosi, Gabor
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2022, 40 (01) : 216 - 226
  • [23] Community aware random walk for network embedding
    Keikha, Mohammad Mehdi
    Rahgozar, Maseud
    Asadpour, Masoud
    KNOWLEDGE-BASED SYSTEMS, 2018, 148 : 47 - 54
  • [24] Evaluating Neural Network Models For Predicting Dynamic Signature Signals
    Zalasinski, Marcin
    Cader, Andrzej
    Patora-Wysocka, Zofia
    Xiao, Min
    JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2024, 14 (04) : 361 - 372
  • [25] Closed benchmarks for network community structure characterization
    Aldecoa, Rodrigo
    Marin, Ignacio
    PHYSICAL REVIEW E, 2012, 85 (02)
  • [26] The Network Structure of Psychopathology in a Community Sample of Preadolescents
    Boschloo, Lynn
    Schoevers, Robert A.
    van Borkulo, Claudia D.
    Borsboom, Denny
    Oldehinkel, Albertine J.
    JOURNAL OF ABNORMAL PSYCHOLOGY, 2016, 125 (04) : 599 - 606
  • [27] Investigating community structure in perspective of ego network
    Biswas, Anupam
    Biswas, Bhaskar
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (20) : 6913 - 6934
  • [28] A comparative study of the measures for evaluating community structure in bipartite networks
    Wang, Xiaodong
    Liu, Jing
    INFORMATION SCIENCES, 2018, 448 : 249 - 262
  • [29] Feature Analysis and Modeling of the Network Community Structure
    Yuan Chao
    Chai Yi
    Wei Shan-Bi
    COMMUNICATIONS IN THEORETICAL PHYSICS, 2012, 58 (04) : 604 - 612
  • [30] An algorithm for network community structure determination by surprise
    Gamermann, Daniel
    Pellizzaro, Jose Antonio
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 595