GraphSHA: Synthesizing Harder Samples for Class-Imbalanced Node Classification

被引:9
|
作者
Li, Wen-Zhi [1 ]
Wang, Chang-Dong [1 ]
Xiong, Hui [2 ,3 ]
Lai, Jian-Huang [1 ]
机构
[1] Sun Yat Sen Univ, CSE, Guangzhou, Peoples R China
[2] HKUST GZ, AI Thrust, Guangzhou, Peoples R China
[3] HKUST, CSE, Hong Kong, Peoples R China
关键词
node classification; class imbalance; graph neural network; hard sample; data augmentation;
D O I
10.1145/3580305.3599374
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance is the phenomenon that some classes have much fewer instances than others, which is ubiquitous in real-world graph-structured scenarios. Recent studies find that off-the-shelf Graph Neural Networks (GNNs) would under-represent minor class samples. We investigate this phenomenon and discover that the subspaces of minor classes being squeezed by those of the major ones in the latent space is the main cause of this failure. We are naturally inspired to enlarge the decision boundaries of minor classes and propose a general framework GraphSHA by Synthesizing HArder minor samples. Furthermore, to avoid the enlarged minor boundary violating the subspaces of neighbor classes, we also propose a module called SemiMixup to transmit enlarged boundary information to the interior of the minor classes while blocking information propagation from minor classes to neighbor classes. Empirically, GraphSHA shows its effectiveness in enlarging the decision boundaries of minor classes, as it outperforms various baseline methods in class-imbalanced node classification with different GNN backbone encoders over seven public benchmark datasets. Code is avilable at https://github.com/wenzhilics/GraphSHA.
引用
收藏
页码:1328 / 1340
页数:13
相关论文
共 50 条
  • [41] Adversarial Kernel Sampling on Class-imbalanced Data Streams
    Yang, Peng
    Li, Ping
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2352 - 2362
  • [42] Research On Classification Method Of High-Dimensional Class-Imbalanced Data Sets Based On SVM
    Zhang, Chunkai
    Guo, Jianwei
    Lu, Junru
    2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 60 - 67
  • [43] Auxiliary generative mutual adversarial networks for class-imbalanced fault diagnosis under small samples
    Li, Ranran
    Li, Shunming
    Xu, Kun
    Zeng, Mengjie
    Li, Xianglian
    Gu, Jianfeng
    Chen, Yong
    CHINESE JOURNAL OF AERONAUTICS, 2023, 36 (09) : 464 - 478
  • [44] Class-imbalanced classifiers for high-dimensional data
    Lin, Wei-Jiun
    Chen, James J.
    BRIEFINGS IN BIOINFORMATICS, 2013, 14 (01) : 13 - 26
  • [45] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [46] Deep learning approach for defective spot welds classification using small and class-imbalanced datasets
    Dai, Wei
    Li, Dayong
    Tang, Ding
    Wang, Huamiao
    Peng, Yinghong
    NEUROCOMPUTING, 2022, 477 : 46 - 60
  • [47] Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification
    Maldonado, Sebastian
    Lopez, Julio
    APPLIED SOFT COMPUTING, 2018, 67 : 94 - 105
  • [48] A survey of class-imbalanced semi-supervised learning
    Gui, Qian
    Zhou, Hong
    Guo, Na
    Niu, Baoning
    MACHINE LEARNING, 2024, 113 (08) : 5057 - 5086
  • [49] Style-KD: Class-imbalanced medical image classification via style knowledge distillation
    Park, Inhyuk
    Kim, Won Hwa
    Ryu, Jongbin
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 91
  • [50] MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network
    Ali-Gombe, Adamu
    Elyan, Eyad
    NEUROCOMPUTING, 2019, 361 : 212 - 221