Improving Generalizability of Graph Anomaly Detection Models via Data Augmentation

被引:16
作者
Zhou, Shuang [1 ]
Huang, Xiao [1 ]
Liu, Ninghao [2 ]
Zhou, Huachi [1 ]
Chung, Fu-Lai [1 ]
Huang, Long-Kai [3 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[2] Univ Georgia, Athens, GA 30602 USA
[3] Tencent AI Lab, Shenzhen 518057, Guangdong, Peoples R China
关键词
Graph neural networks; Graph anomaly detection; model generalizability; data augmentation;
D O I
10.1109/TKDE.2023.3271771
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph anomaly detection (GAD) has wide applications in real-world networked systems. In many scenarios, people need to identify anomalies on new (sub)graphs, but they may lack labels to train an effective detection model. Since recent semi-supervised GAD methods, which can leverage the available labels as prior knowledge, have achieved superior performance than unsupervised methods, one natural idea is to directly adopt a trained semi-supervised GAD model to the new (sub)graphs for testing. However, we find that existing semi-supervised GAD methods suffer from poor generalization issues, i.e., well-trained models could not perform well on an unseen area (i.e., not accessible in training) of the graph. Motivated by this, we formally define the problem of generalized graph anomaly detection that aims to effectively identify anomalies on both the training-domain graph(s) and the unseen test graph(s). Nevertheless, it is a challenging task since only limited labels are available, and the normal data distribution may differ between training and testing data. Accordingly, we propose a data augmentation method named AugAN (Augmentation for Anomaly and Normal distributions) to enrich training data and adopt a customized episodic training strategy for learning with the augmented data. Extensive experiments verify the effectiveness of AugAN in improving model generalizability.
引用
收藏
页码:12721 / 12735
页数:15
相关论文
共 75 条
[61]   NodeAug: Semi-Supervised Node Classification with Data Augmentation [J].
Wang, Yiwei ;
Wang, Wei ;
Liang, Yuxuan ;
Cai, Yujun ;
Liu, Juncheng ;
Hooi, Bryan .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :207-217
[62]   TSPAN4-positive migrasome derived from retinal pigmented epithelium cells contributes to the development of proliferative vitreoretinopathy [J].
Wu, Liangjing ;
Yang, Shuai ;
Li, Hui ;
Zhang, Yao ;
Feng, Le ;
Zhang, Conghui ;
Wei, Jiayi ;
Gu, Xunyi ;
Xu, Guotong ;
Wang, Zhaoyang ;
Wang, Fang .
JOURNAL OF NANOBIOTECHNOLOGY, 2022, 20 (01)
[63]  
Wu Q., 2022, P INT C LEARN REPR
[64]   MIX: A Joint Learning Framework for Detecting Both Clustered and Scattered Outliers in Mixed-Type Data [J].
Xu, Hongzuo ;
Wang, Yijie ;
Wang, Yongjun ;
Wu, Zhiyue .
2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, :1408-1413
[65]  
Xu Z., 2022, P INT C LEARN REPR, P1
[66]   Contrastive Attributed Network Anomaly Detection with Data Augmentation [J].
Xu, Zhiming ;
Huang, Xiao ;
Zhao, Yue ;
Dong, Yushun ;
Li, Jundong .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 :444-457
[67]   Region or Global? A Principle for Negative Sampling in Graph-Based Recommendation [J].
Yang, Zhen ;
Ding, Ming ;
Zou, Xu ;
Tang, Jie ;
Xu, Bin ;
Zhou, Chang ;
Yang, Hongxia .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (06) :6264-6277
[68]   Graph Augmentation Learning [J].
Yu, Shuo ;
Huang, Huafei ;
Dao, Minh N. ;
Xia, Feng .
COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, :1063-1072
[69]   CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features [J].
Yun, Sangdoo ;
Han, Dongyoon ;
Oh, Seong Joon ;
Chun, Sanghyuk ;
Choe, Junsuk ;
Yoo, Youngjoon .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6022-6031
[70]   GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks [J].
Zhao, Tianxiang ;
Zhang, Xiang ;
Wang, Suhang .
WSDM '21: PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2021, :833-841