Positive-Unlabeled Learning for Knowledge Distillation

被引：0

作者：

Ning Jiang

Jialiang Tang

Wenxin Yu

机构：

[1] Southwest University of Science and Technology,School of Computer Science and Technology

来源：

Neural Processing Letters | 2023年 / 55卷

关键词：

Convolutional neural networks; Model compression; Knowledge distillation; Positive-unlabeled learning; Attention mechanism; Soft-target;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Convolutional neural networks (CNNs) have greatly promoted the development of artificial intelligence. In general, CNNs with high performance are over-parameterized, requiring massive calculations to process and predict the data. It leads CNNs unable to apply to exiting resource-limited intelligence devices. In this paper, we propose an efficient model compression framework based on knowledge distillation to train a compact student network by a large teacher network. Our key point is to introduce a positive-unlabeled (PU) classifier to promote the compressed student network to learn the features of the prominent teacher network as much as possible. During the training, the PU classifier is to discriminate the features of the teacher network as high-quality and discriminate the features of the student network as low-quality. Simultaneously, the student network learns knowledge from the teacher network through the soft-targets and attention features. Extensive experimental evaluations on four benchmark image classification datasets show that our method outperforms the prior works with a large margin at the same parameters and calculations cost. When selecting the VGGNet19 as the teacher network to train on the CIFAR dataset, the student network VGGNet13 achieves 94.47% and 75.73% accuracy on the CIFAR-10 and CIFAR-100 datasets, which improved 1.02% and 2.44%, respectively.

引用

页码：2613 / 2631

页数：18

共 26 条

[1] Hinton G(2015)Distilling the knowledge in a neural network Comp Sci 14 38-39
[2] Vinyals O(1998)Gradient-based learning applied to document recognition Proc IEEE 86 2278-2324
[3] Dean J(2021)Decentralized dual proximal gradient algorithms for non-smooth constrained composite optimization problems IEEE Trans Parallel Distrib Syst 32 2594-2605
[4] LeCun Y(2020)Event-triggered fuzzy bipartite tracking control for network systems based on distributed reduced-order observers IEEE Trans Fuzzy Syst 29 1601-1614
[5] Bottou L(2020)Neural-network-based event-triggered adaptive control of nonaffine nonlinear multiagent systems with dynamic uncertainties IEEE Transactions on Neural Networks and Learning Systems 32 2239-2250
[6] Bengio Y(2015)Rgb-d object recognition via incorporating latent data structure and prior knowledge IEEE Trans Multimedia 17 1899-1908
[7] Haffner P(undefined)undefined undefined undefined undefined-undefined
[8] Li H(undefined)undefined undefined undefined undefined-undefined
[9] Hu J(undefined)undefined undefined undefined undefined-undefined
[10] Ran L(undefined)undefined undefined undefined undefined-undefined

← 1 2 3 →