Narrow the Input Mismatch in Deep Graph Neural Network Distillation

被引：0

作者：

Zhou, Qiqi ^{[1
]}

Shen, Yanyan ^{[2
]}

Chen, Lei ^{[1
,3
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

[3] Hong Kong Univ Sci & Technol Guangzhou, Guangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Graph Neural Networks; Knowledge Distillation; Bayesian Optimization;

D O I：

10.1145/3580305.3599442

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Graph neural networks (GNNs) have been widely studied for modeling graph-structured data. Thanks to the over-parameterization and large receptive field of deep GNNs, "deep" is a promising direction to develop GNNs further and has shown some superior performances. However, the over-stacked structures of deep architectures incur high inference cost in deployment. To compress deep GNNs, we can use knowledge distillation (KD) to make shallow student GNNs mimic teacher GNNs. Existing KD methods in graph domain focus on constructing diverse supervision on embedding or prediction produced by student GNNs, but overlook the gap of the receptive field (i.e., input information) between student and teacher, which brings difficulties to KD. We call this gap "input mismatch". To alleviate this problem, we propose a lightweight stochastic extended module to provide an estimation for missing input information for student GNNs. The estimator models the distribution of missing information. Specifically, we model the missing information as an independent distribution from graph level and a conditional distribution from node level (given the condition of observable input). These two estimates are optimized using a Bayesian methodology and combined into a balanced estimate as additional input to student GNNs. To the best of our knowledge, we are the first to address the "input mismatch" problem in deep GNNs distillation. Experiments on extensive benchmarks demonstrate that our method outperforms existing KD methods for GNNs in distillation performance, which confirms that the estimations are reasonable and effective.

引用

页码：3581 / 3592

页数：12

共 48 条

[1] [Anonymous], 2021, INT C MACH LEARN, DOI DOI 10.1109/PIC53636.2021.9687081
[2] Reconciling modern machine-learning practice and the classical bias-variance trade-off
Belkin, Mikhail
Hsu, Daniel
Ma, Siyuan
Mandal, Soumik
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (32) : 15849 - 15854
[3] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[4] Inverse problems: From regularization to Bayesian inference
Calvetti, D.
Somersalo, E.
[J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2018, 10 (03):
[5] Chen DL, 2020, AAAI CONF ARTIF INTE, V34, P3438
[6] Chen Ming, 2020, P MACHINE LEARNING R, V119
[7] Chen Tianlong, 2022, IEEE T PATTERN ANAL
[8] Dashti M., 2017, HDB UNCERTAINTY QUAN, P311, DOI [10.1007/978-3-319-12385-1_7, 10.1007/978-3-319-12385-17]
[9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[10] Metapath-guided Heterogeneous Graph Neural Network for Intent Recommendation
Fan, Shaohua
Zhu, Junxiong
Han, Xiaotian
Shi, Chuan
Hu, Linmei
Ma, Biyu
Li, Yongliang
[J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 2478 - 2486

← 1 2 3 4 5 →