Mastering Long-Tail Complexity on Graphs: Characterization, Learning, and Generalization

被引:3
作者
Wang, Haohui [1 ]
Jing, Baoyu [2 ]
Ding, Kaize [3 ]
Zhu, Yada [4 ]
Cheng, Wei [5 ]
Zhang, Si [6 ]
Fan, Yonghui [7 ]
Zhang, Liqing [1 ]
Zhou, Dawei [1 ]
机构
[1] Virginia Tech, Blacksburg, VA 24061 USA
[2] Univ Illinois, Urbana, IL USA
[3] Northwestern Univ, Evanston, IL USA
[4] IBM Res, Yorktown Hts, NY USA
[5] NEC Labs Amer, Princeton, NJ USA
[6] Meta, Sunnyvale, CA USA
[7] Amazon AGI, Sunnyvale, CA USA
来源
PROCEEDINGS OF THE 30TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2024 | 2024年
基金
美国国家科学基金会;
关键词
Long-tail Learning; Generalization; Graph Mining;
D O I
10.1145/3637528.3671880
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of long-tail classification on graphs, the vast majority of existing work primarily revolves around the development of model debiasing strategies, intending to mitigate class imbalances and enhance the overall performance. Despite the notable success, there is very limited literature that provides a theoretical tool for characterizing the behaviors of long-tail classes in graphs and gaining insight into generalization performance in real-world scenarios. To bridge this gap, we propose a generalization bound for long-tail classification on graphs by formulating the problem in the fashion of multi-task learning, i.e., each task corresponds to the prediction of one particular class. Our theoretical results show that the generalization performance of long-tail classification is dominated by the overall loss range and the task complexity. Building upon the theoretical findings, we propose a novel generic framework HIERTAIL for long-tail classification on graphs. In particular, we start with a hierarchical task grouping module that allows us to assign related tasks into hypertasks and thus control the complexity of the task space; then, we further design a balanced contrastive learning module to adaptively balance the gradients of both head and tail classes to control the loss range across all tasks in a unified fashion. Extensive experiments demonstrate the effectiveness of HIERTAIL in characterizing long-tail classes on real graphs, which achieves up to 12.9% improvement over the leading baseline method in balanced accuracy.
引用
收藏
页码:3045 / 3056
页数:12
相关论文
共 69 条
[1]   Graph based anomaly detection and description: a survey [J].
Akoglu, Leman ;
Tong, Hanghang ;
Koutra, Danai .
DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (03) :626-688
[2]   Deep Over-sampling Framework for Classifying Imbalanced Data [J].
Ando, Shin ;
Huang, Chun Yuan .
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 :770-785
[3]  
[Anonymous], 2019, PMLR
[4]  
Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
[5]  
Bojchevski A., 2018, ICLR
[6]  
Cao KD, 2019, ADV NEUR IN, V32
[7]  
Chawla NiteshV., 2003, Proceedings of the ICML, V3
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters [J].
Dou, Yingtong ;
Liu, Zhiwei ;
Sun, Li ;
Deng, Yutong ;
Peng, Hao ;
Yu, Philip S. .
CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, :315-324
[10]  
Elkan Charles., 2001, P 17 INT JOINT C ART, P973, DOI [DOI 10.5555/1642194.1642224, DOI 10.1007/978-3-642-55498-8_24]