SORAG: Synthetic Data Over-Sampling Strategy on Multi-Label Graphs

被引：3

作者：

Duan, Yijun ^{[1
]}

Liu, Xin ^{[1
]}

Jatowt, Adam ^{[2
]}

Yu, Hai-tao ^{[3
]}

Lynden, Steven ^{[1
]}

Kim, Kyoung-Sook ^{[1
]}

Matono, Akiyoshi ^{[1
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol Tokyo Waterfront, 2 Chome 3-26 Aomi, Tokyo 1350064, Japan

[2] Univ Innsbruck, Dept Comp Sci, Innrain 52, A-6020 Innsbruck, Austria

[3] Univ Tsukuba, Fac Lib Informat & Media Sci, 1 Chome 1-1 Tennodai, Tsukuba, Ibaraki 3058577, Japan

来源：

REMOTE SENSING | 2022年 / 14卷 / 18期

关键词：

imbalanced data classification; data over-sampling; generative adversarial network; graph convolutional network; semi-supervised learning; remote sensing; SMOTE;

D O I：

10.3390/rs14184479

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

In many real-world networks of interest in the field of remote sensing (e.g., public transport networks), nodes are associated with multiple labels, and node classes are imbalanced; that is, some classes have significantly fewer samples than others. However, the research problem of imbalanced multi-label graph node classification remains unexplored. This non-trivial task challenges the existing graph neural networks (GNNs) because the majority class can dominate the loss functions of GNNs and result in the overfitting of the majority class features and label correlations. On non-graph data, minority over-sampling methods (such as the synthetic minority over-sampling technique and its variants) have been demonstrated to be effective for the imbalanced data classification problem. This study proposes and validates a new hypothesis with unlabeled data over-sampling, which is meaningless for imbalanced non-graph data; however, feature propagation and topological interplay mechanisms between graph nodes can facilitate the representation learning of imbalanced graphs. Furthermore, we determine empirically that ensemble data synthesis through the creation of virtual minority samples in the central region of a minority and generation of virtual unlabeled samples in the boundary region between a minority and majority is the best practice for the imbalanced multi-label graph node classification task. Our proposed novel data over-sampling framework is evaluated using multiple real-world network datasets, and it outperforms diverse, strong benchmark models by a large margin.

引用

页数：25

共 50 条

[1] Anonymity can Help Minority: A Novel Synthetic Data Over-Sampling Strategy on Multi-label Graphs
Duan, Yijun
Liu, Xin
Jatowt, Adam
Yu, Hai-Tao
Lynden, Steven
Kim, Kyoung-Sook
Matono, Akiyoshi
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT II, 2023, 13714 : 20 - 36
[2] A novel ensemble over-sampling approach based Chebyshev inequality for imbalanced multi-label data
Ren, Weishuo
Zheng, Yifeng
Zhang, Wenjie
Qing, Depeng
Zeng, Xianlong
Li, Guohe
NEUROCOMPUTING, 2025, 612
[3] Hierarchical multi-label classification based on over-sampling and hierarchy constraint for gene function prediction
Chen, Benhui
Hu, Jinglu
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2012, 7 (02) : 183 - 189
[4] Stratified Sampling for Extreme Multi-label Data
Merrillees, Maximillian
Du, Lan
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 334 - 345
[5] Adaptive Model over a Multi-label Streaming Data Experimental Study over Stream Multi-label Classification
ALattas, Amani M.
2018 21ST SAUDI COMPUTER SOCIETY NATIONAL COMPUTER CONFERENCE (NCC), 2018,
[6] Synthetic Oversampling of Multi-label Data Based on Local Label Distribution
Liu, Bin
Tsoumakas, Grigorios
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 11907 : 180 - 193
[7] Imbalanced Node Classification With Synthetic Over-Sampling
Zhao, Tianxiang
Zhang, Xiang
Wang, Suhang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8515 - 8528
[8] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
2002, American Association for Artificial Intelligence (16):
[9] Multi-fidelity model based on synthetic minority over-sampling technique
Jiuxiang Song
Jizhong Liu
Multimedia Tools and Applications, 2024, 83 : 33123 - 33139
[10] Multi-fidelity model based on synthetic minority over-sampling technique
Song, Jiuxiang
Liu, Jizhong
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 33123 - 33139

← 1 2 3 4 5 →