SORAG: Synthetic Data Over-Sampling Strategy on Multi-Label Graphs

被引:3
|
作者
Duan, Yijun [1 ]
Liu, Xin [1 ]
Jatowt, Adam [2 ]
Yu, Hai-tao [3 ]
Lynden, Steven [1 ]
Kim, Kyoung-Sook [1 ]
Matono, Akiyoshi [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol Tokyo Waterfront, 2 Chome 3-26 Aomi, Tokyo 1350064, Japan
[2] Univ Innsbruck, Dept Comp Sci, Innrain 52, A-6020 Innsbruck, Austria
[3] Univ Tsukuba, Fac Lib Informat & Media Sci, 1 Chome 1-1 Tennodai, Tsukuba, Ibaraki 3058577, Japan
关键词
imbalanced data classification; data over-sampling; generative adversarial network; graph convolutional network; semi-supervised learning; remote sensing; SMOTE;
D O I
10.3390/rs14184479
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In many real-world networks of interest in the field of remote sensing (e.g., public transport networks), nodes are associated with multiple labels, and node classes are imbalanced; that is, some classes have significantly fewer samples than others. However, the research problem of imbalanced multi-label graph node classification remains unexplored. This non-trivial task challenges the existing graph neural networks (GNNs) because the majority class can dominate the loss functions of GNNs and result in the overfitting of the majority class features and label correlations. On non-graph data, minority over-sampling methods (such as the synthetic minority over-sampling technique and its variants) have been demonstrated to be effective for the imbalanced data classification problem. This study proposes and validates a new hypothesis with unlabeled data over-sampling, which is meaningless for imbalanced non-graph data; however, feature propagation and topological interplay mechanisms between graph nodes can facilitate the representation learning of imbalanced graphs. Furthermore, we determine empirically that ensemble data synthesis through the creation of virtual minority samples in the central region of a minority and generation of virtual unlabeled samples in the boundary region between a minority and majority is the best practice for the imbalanced multi-label graph node classification task. Our proposed novel data over-sampling framework is evaluated using multiple real-world network datasets, and it outperforms diverse, strong benchmark models by a large margin.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Anonymity can Help Minority: A Novel Synthetic Data Over-Sampling Strategy on Multi-label Graphs
    Duan, Yijun
    Liu, Xin
    Jatowt, Adam
    Yu, Hai-Tao
    Lynden, Steven
    Kim, Kyoung-Sook
    Matono, Akiyoshi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 20 - 36
  • [2] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
    Ren, Weishuo
    Zheng, Yifeng
    Zhang, Wenjie
    Qing, Depeng
    Zeng, Xianlong
    Li, Guohe
    NEUROCOMPUTING, 2025, 612
  • [3] Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction
    Chen, Benhui
    Hu, Jinglu
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2012, 7 (02) : 183 - 189
  • [4] Stratified Sampling for Extreme Multi-label Data
    Merrillees, Maximillian
    Du, Lan
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 334 - 345
  • [5] Adaptive Model over a Multi-label Streaming Data Experimental Study over Stream Multi-label Classification
    ALattas, Amani M.
    2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,
  • [6] Synthetic Oversampling of Multi-label Data Based on Local Label Distribution
    Liu, Bin
    Tsoumakas, Grigorios
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 : 180 - 193
  • [7] Imbalanced Node Classification With Synthetic Over-Sampling
    Zhao, Tianxiang
    Zhang, Xiang
    Wang, Suhang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8515 - 8528
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    2002, American Association for Artificial Intelligence (16):
  • [9] Multi-fidelity model based on synthetic minority over-sampling technique
    Jiuxiang Song
    Jizhong Liu
    Multimedia Tools and Applications, 2024, 83 : 33123 - 33139
  • [10] Multi-fidelity model based on synthetic minority over-sampling technique
    Song, Jiuxiang
    Liu, Jizhong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 33123 - 33139