Narrow the Input Mismatch in Deep Graph Neural Network Distillation

被引:0
作者
Zhou, Qiqi [1 ]
Shen, Yanyan [2 ]
Chen, Lei [1 ,3 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[3] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年
基金
美国国家科学基金会;
关键词
Graph Neural Networks; Knowledge Distillation; Bayesian Optimization;
D O I
10.1145/3580305.3599442
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph neural networks (GNNs) have been widely studied for modeling graph-structured data. Thanks to the over-parameterization and large receptive field of deep GNNs, "deep" is a promising direction to develop GNNs further and has shown some superior performances. However, the over-stacked structures of deep architectures incur high inference cost in deployment. To compress deep GNNs, we can use knowledge distillation (KD) to make shallow student GNNs mimic teacher GNNs. Existing KD methods in graph domain focus on constructing diverse supervision on embedding or prediction produced by student GNNs, but overlook the gap of the receptive field (i.e., input information) between student and teacher, which brings difficulties to KD. We call this gap "input mismatch". To alleviate this problem, we propose a lightweight stochastic extended module to provide an estimation for missing input information for student GNNs. The estimator models the distribution of missing information. Specifically, we model the missing information as an independent distribution from graph level and a conditional distribution from node level (given the condition of observable input). These two estimates are optimized using a Bayesian methodology and combined into a balanced estimate as additional input to student GNNs. To the best of our knowledge, we are the first to address the "input mismatch" problem in deep GNNs distillation. Experiments on extensive benchmarks demonstrate that our method outperforms existing KD methods for GNNs in distillation performance, which confirms that the estimations are reasonable and effective.
引用
收藏
页码:3581 / 3592
页数:12
相关论文
共 48 条
  • [1] [Anonymous], 2021, INT C MACH LEARN, DOI DOI 10.1109/PIC53636.2021.9687081
  • [2] Reconciling modern machine-learning practice and the classical bias-variance trade-off
    Belkin, Mikhail
    Hsu, Daniel
    Ma, Siyuan
    Mandal, Soumik
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Inverse problems: From regularization to Bayesian inference
    Calvetti, D.
    Somersalo, E.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2018, 10 (03):
  • [5] Chen DL, 2020, AAAI CONF ARTIF INTE, V34, P3438
  • [6] Chen Ming, 2020, P MACHINE LEARNING R, V119
  • [7] Chen Tianlong, 2022, IEEE T PATTERN ANAL
  • [8] Dashti M., 2017, HDB UNCERTAINTY QUAN, P311, DOI [10.1007/978-3-319-12385-1_7, 10.1007/978-3-319-12385-17]
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation
    Fan, Shaohua
    Zhu, Junxiong
    Han, Xiaotian
    Shi, Chuan
    Hu, Linmei
    Ma, Biyu
    Li, Yongliang
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2478 - 2486