Many-objective evolutionary self-knowledge distillation with adaptive branch fusion method

被引:2
作者
Bai, Jiayuan [1 ]
Zhang, Yi [2 ]
机构
[1] Hong Kong Polytech Univ, Fac Engn, Hung Hom, Kowloon, Hong Kong, Peoples R China
[2] Jilin jianzhu Univ, Coll Elect & Comp Sci, 5088 Xincheng St, Changchun 130118, Jilin, Peoples R China
关键词
Model compression; Self-knowledge distillation; Many -objective evolutionary optimization; Knowledge fusion; ALGORITHM; MOEA/D;
D O I
10.1016/j.ins.2024.120586
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As a new technology for model compression, self-knowledge distillation (SKD) avoids the large computational overhead of training the teacher model seen with traditional knowledge distillation. However, existing SKD methods pay attention to the knowledge transfer between deep and shallow layers of the network but ignore the mutual learning between shallow branches. This paper proposes a many-objective evolutionary self-knowledge distillation framework (MaOESKD) to guide the knowledge fusion between branches in the SKD neural network. This framework embeds an optimization module and a temporary branch into the multi-branch SKD network. The optimization module includes a many-objective adaptive weight optimization model (MaAWOM) and an many evolutionary optimization algorithm based on multi-strategy consensus mechanism (MaOEA-MCM); meanwhile, the temporary branch performs linear weighted fusion. In the MOBWOM, the weight of different branches in knowledge fusion is taken as the decision variable, and the mutual information, covariance, KL divergence between branch output features, and the total information of each branch are taken as the optimization objective. The MSCMEA integrates several state-of-the-art individual selection strategies in the field of evolutionary algorithms. It includes shift density estimation (SDE), penalized boundary intersection (PBI), balanced fitness estimation (BFE), and adaptive position transformation (APT). The accuracy of MOESKD achieves 99.70, 95.74 and 78.21 in MNIST, CIFAR-10 and CIFAR-100.
引用
收藏
页数:13
相关论文
共 33 条
[1]   Efficient Semantic Segmentation via Self-Attention and Self-Distillation [J].
An, Shumin ;
Liao, Qingmin ;
Lu, Zongqing ;
Xue, Jing-Hao .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (09) :15256-15266
[2]   A fast and elitist multiobjective genetic algorithm: NSGA-II [J].
Deb, K ;
Pratap, A ;
Agarwal, S ;
Meyarivan, T .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) :182-197
[3]   Knowledge Distillation: A Survey [J].
Gou, Jianping ;
Yu, Baosheng ;
Maybank, Stephen J. ;
Tao, Dacheng .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (06) :1789-1819
[4]   Learning Lightweight Lane Detection CNNs by Self Attention Distillation [J].
Hou, Yuenan ;
Ma, Zheng ;
Liu, Chunxiao ;
Loy, Chen Change .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1013-1021
[5]   MHAT: An efficient model-heterogenous aggregation training scheme for federated learning [J].
Hu, Li ;
Yan, Hongyang ;
Li, Lang ;
Pan, Zijie ;
Liu, Xiaozhang ;
Zhang, Zulong .
INFORMATION SCIENCES, 2021, 560 (560) :493-503
[6]  
Ikeda K, 2001, IEEE C EVOL COMPUTAT, P957, DOI 10.1109/CEC.2001.934293
[7]   Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation [J].
Ji, Mingi ;
Shin, Seungjae ;
Hwang, Seunghyun ;
Park, Gibeom ;
Moon, Il-Chul .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10659-10668
[8]   Data-free knowledge distillation in neural networks for regression [J].
Kang, Myeonginn ;
Kang, Seokho .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 175
[9]   Self-Referenced Deep Learning [J].
Lan, Xu ;
Zhu, Xiatian ;
Gong, Shaogang .
COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 :284-300
[10]   Combining convergence and diversity in evolutionary multiobjective optimization [J].
Laumanns, M ;
Thiele, L ;
Deb, K ;
Zitzler, E .
EVOLUTIONARY COMPUTATION, 2002, 10 (03) :263-282