Compressing Deep Graph Neural Networks via Adversarial Knowledge Distillation

被引:20
作者
He, Huarui [1 ]
Wang, Jie [1 ,2 ]
Zhang, Zhanqiu [1 ]
Wu, Feng [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
来源
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年
关键词
Graph Neural Networks; Knowledge Distillation; Adversarial Training; Network Compression;
D O I
10.1145/3534678.3539315
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep graph neural networks (GNNs) have been shown to be expressive for modeling graph-structured data. Nevertheless, the over-stacked architecture of deep graph models makes it difficult to deploy and rapidly test on mobile or embedded systems. To compress over-stacked GNNs, knowledge distillation via a teacher-student architecture turns out to be an effective technique, where the key step is to measure the discrepancy between teacher and student networks with predefined distance functions. However, using the same distance for graphs of various structures may be unfit, and the optimal distance formulation is hard to determine. To tackle these problems, we propose a novel Adversarial Knowledge Distillation framework for graph models named GraphAKD, which adversarially trains a discriminator and a generator to adaptively detect and decrease the discrepancy. Specifically, noticing that the well-captured inter-node and inter-class correlations favor the success of deep GNNs, we propose to criticize the inherited knowledge from node-level and class-level views with a trainable discriminator. The discriminator distinguishes between teacher knowledge and what the student inherits, while the student GNN works as a generator and aims to fool the discriminator. Experiments on node-level and graph-level classification benchmarks demonstrate that GraphAKD improves the student performance by a large margin. The results imply that GraphAKD can precisely transfer knowledge from a complicated teacher GNN to a compact student GNN.
引用
收藏
页码:534 / 544
页数:11
相关论文
共 57 条
  • [1] Alam F, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P1077
  • [2] Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings
    Bergmann, Paul
    Fauser, Michael
    Sattlegger, David
    Steger, Carsten
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4182 - 4191
  • [3] Bojchevski A., 2018, Deep gaussian embedding of graphs: Unsupervised inductive learning via ranking
  • [4] Bruna J, 2013, ARXIV
  • [5] Chen M, 2020, PR MACH LEARN RES, V119
  • [6] Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
    Chiang, Wei-Lin
    Liu, Xuanqing
    Si, Si
    Li, Yang
    Bengio, Samy
    Hsieh, Cho-Jui
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 257 - 266
  • [7] Chung Inseop, 2020, PROC ICML
  • [8] Dai HJ, 2018, PR MACH LEARN RES, V80
  • [9] Graph Adversarial Training: Dynamically Regularizing Based on Graph Structure
    Feng, Fuli
    He, Xiangnan
    Tang, Jie
    Chua, Tat-Seng
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2493 - 2504
  • [10] Fey M, 2021, PR MACH LEARN RES, V139