Evaluating Overfit and Underfit in Models of Network Community Structure

被引:92
|
作者
Ghasemian, Amir [1 ]
Hosseinmardi, Homa [2 ]
Clauset, Aaron [3 ,4 ,5 ]
机构
[1] Univ Colorado, Dept Comp Sci, Boulder, CO 80309 USA
[2] Univ Southern Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[3] Univ Colorado, Boulder, CO 80309 USA
[4] Univ Colorado, Biofrontiers Inst, Boulder, CO 80305 USA
[5] Santa Fe Inst, 1399 Hyde Pk Rd, Santa Fe, NM 87501 USA
基金
美国国家科学基金会;
关键词
Task analysis; Clustering algorithms; Probabilistic logic; Bayes methods; Detection algorithms; Prediction algorithms; Partitioning algorithms; Community detection; model selection; overfitting; underfitting; link prediction; link description; STOCHASTIC BLOCKMODELS; COMPLEX NETWORKS; PREDICTION; RESOLUTION; SELECTION;
D O I
10.1109/TKDE.2019.2911585
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common graph mining task is community detection, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detection's No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. Here, we present a broad investigation of over and underfitting across 16 state-of-the-art community detection algorithms applied to a novel benchmark corpus of 572 structurally diverse real-world networks. We find that (i) algorithms vary widely in the number and composition of communities they find, given the same input; (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks; (iii) algorithmic differences induce wide variation in accuracy on link-based learning tasks; and, (iv) no algorithm is always the best at such tasks across all inputs. Finally, we quantify each algorithm's overall tendency to over or underfit to network data using a theoretically principled diagnostic, and discuss the implications for future advances in community detection.
引用
收藏
页码:1722 / 1735
页数:14
相关论文
共 50 条
  • [1] Exploring Temporal Community Structure via Network Embedding
    Li, Tianpeng
    Wang, Wenjun
    Jiao, Pengfei
    Wang, Yinghui
    Ding, Ruomeng
    Wu, Huaming
    Pan, Lin
    Jin, Di
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 7021 - 7033
  • [2] Inference for Network Regression Models with Community Structure
    Pan, Mengjie
    McCormick, Tyler H.
    Fosdick, Bailey K.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] Evolutionary Network Embedding Preserving Both Local Proximity and Community Structure
    Li, Mingming
    Liu, Jing
    Wu, Peng
    Teng, Xiangyi
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (03) : 523 - 535
  • [4] A Community Structure Enhancement-Based Community Detection Algorithm for Complex Networks
    Su, Yansen
    Liu, Chunlong
    Niu, Yunyun
    Cheng, Fan
    Zhang, Xingyi
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (05): : 2833 - 2846
  • [5] Complex network structure extraction based on community relevance
    Ding, Jingyi
    Jiaot, Licheng
    Wu, Jianshe
    Liu, Fang
    INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2020, 31 (04):
  • [6] Identifying and evaluating community structure in complex networks
    Steinhaeuser, Karsten
    Chawla, Nitesh V.
    PATTERN RECOGNITION LETTERS, 2010, 31 (05) : 413 - 421
  • [7] Heterogeneous Information Network Representation Learning Incorporating Community Structure
    Yu, Wei
    Xu, Guangquan
    Li, Xiaoming
    Chen, Xue
    Sun, Ying
    Yuan, Ning
    IEEE ACCESS, 2022, 10 : 51249 - 51260
  • [8] Evolutionary Markov Dynamics for Network Community Detection
    Wang, Zhen
    Wang, Chunyu
    Li, Xianghua
    Gao, Chao
    Li, Xuelong
    Zhu, Junyou
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (03) : 1206 - 1220
  • [9] Evaluating network models: A likelihood analysis
    Wang, Wen-Qiang
    Zhang, Qian-Ming
    Zhou, Tao
    EPL, 2012, 98 (02)
  • [10] Obfuscating Community Structure in Complex Network With Evolutionary Divide-and-Conquer Strategy
    Zhao, Jie
    Cheong, Kang Hao
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (06) : 1926 - 1940