NodeAug: Semi-Supervised Node Classification with Data Augmentation

被引:94
作者
Wang, Yiwei [1 ]
Wang, Wei [1 ]
Liang, Yuxuan [1 ]
Cai, Yujun [1 ]
Liu, Juncheng [1 ]
Hooi, Bryan [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
来源
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2020年
关键词
graph convolutional networks; data augmentation; graph mining; semi-supervised learning;
D O I
10.1145/3394486.3403063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
By using Data Augmentation (DA), we present a new method to enhance Graph Convolutional Networks (GCNs), that are the state-of-the-art models for semi-supervised node classification. DA for graph data remains under-explored. Due to the connections built by edges, DA for different nodes influence each other and lead to undesired results, such as uncontrollable DA magnitudes and changes of ground-truth labels. To address this issue, we present the NodeAug (Node-Parallel Augmentation) scheme, that creates a 'parallel universe' for each node to conduct DA, to block the undesired effects from other nodes. NodeAug regularizes the model prediction of every node (including unlabeled) to be invariant with respect to changes induced by Data Augmentation (DA), so as to improve the effectiveness. To augment the input features from different aspects, we propose three DA strategies by modifying both node attributes and the graph structure. In addition, we introduce the subgraph mini-batch training for the efficient implementation of NodeAug. The approach takes the subgraph corresponding to the receptive fields of a batch of nodes as the input per iteration, rather than the whole graph that the prior full-batch training takes. Empirically, NodeAug yields significant gains for strong GCN models on the Cora, Citeseer, Pubmed, and two co-authorship networks, with a more efficient training process thanks to the proposed subgraph mini-batch training approach.
引用
收藏
页码:207 / 217
页数:11
相关论文
共 34 条
[1]  
Berthelot D, 2019, ADV NEUR IN, V32
[2]  
Bloice M. D., 2017, Journal of Open Source Software, V2, P432, DOI [DOI 10.21105/JOSS.00432, 10.21105/joss.00432]
[3]  
Buja A., 1996, J. Comput. Graph. Statist., V5, P78, DOI DOI 10.1080/10618600.1996.10474696
[4]  
Buluc Aydin, 2011, P 2011 INT C HIGH PE, P1
[5]  
Chapelle O., 2005, INT C ART INT STAT, P57
[6]  
Clark K, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P1914
[7]  
Deng Z., 2019, ARXIV190209192
[8]  
DeVries T., 2017, IMPROVED REGULARIZAT
[9]   Semi-supervised Learning on Graphs with Generative Adversarial Nets [J].
Ding, Ming ;
Tang, Jie ;
Zhang, Jie .
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, :913-922
[10]   Scale-free topology of e-mail networks [J].
Ebel, H ;
Mielsch, LI ;
Bornholdt, S .
PHYSICAL REVIEW E, 2002, 66 (03) :1-035103