Resource efficient AI: Exploring neural network pruning for task specialization

被引:3
作者
Balemans, Dieter [1 ,2 ]
Reiter, Philippe [1 ]
Steckel, Jan [2 ,3 ]
Hellinckx, Peter [1 ]
机构
[1] Univ Antwerp, Fac Appl Engn IDLab, IMEC, Sint Pietersvliet 7, B-2000 Antwerp, Belgium
[2] Univ Antwerp, Fac Appl Engn, CoSysLab, Groenenborgerlaan 171, B-2020 Antwerp, Belgium
[3] Flanders Make, Strateg Res Ctr, Oude Diestersebaan 133, B-3920 Lommel, Belgium
关键词
Neural network compression; Machine learning; Explainable AI; Neural network pruning; Edge inference; GRADIENT;
D O I
10.1016/j.iot.2022.100599
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper explores the use of neural network pruning for transfer learning applications for more resource-efficient inference. The goal is to focus and optimize a neural network on a smaller specialized target task. With the advent of IoT, we have seen an immense increase in AI-based applications on mobile and embedded devices, such as wearables and other smart appliances. However, with the ever-increasing complexity and capabilities of machine learning algorithms, this push to the edge has led to new challenges due to the constraints imposed by the limited availability of resources on these devices. Some form of compression is needed to allow for stateof-the-art convolutional neural networks to run on edge devices. In this work, we adapt existing neural network pruning methods to allow them to specialize networks to only focus on a subset of what they were originally trained for. This is a transfer learning use-case where we optimize large pre-trained networks. This differs from standard optimization techniques by allowing the network to forget certain concepts and allow the network's footprint to be even smaller. We compare different pruning criteria, including one from the field of Explainable AI (XAI), to determine which technique allows for the smallest possible network while maintaining high performance on the target task. Our results show the benefits of using network specialization when executing neural networks on embedded devices both with and without GPU acceleration.
引用
收藏
页数:11
相关论文
共 42 条
[1]   Moving Convolutional Neural Networks to Embedded Systems: the AlexNet and VGG-16 case [J].
Alippi, Cesare ;
Disabato, Simone ;
Roveri, Manuel .
2018 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS (IPSN), 2018, :212-223
[2]  
[Anonymous], 2021, NVIDIA JETSON TX2 DE
[3]  
[Anonymous], 2009, Rep. TR-2009
[4]  
[Anonymous], 2021, RASPB PI 4 MOD B SPE
[5]  
[Anonymous], 2017, P INT C LEARN REPR
[6]  
Anwar S, 2015, Arxiv, DOI arXiv:1512.08571
[7]  
Ardakani A, 2017, IEEE GLOB CONF SIG, P1325, DOI 10.1109/GlobalSIP.2017.8309176
[8]  
Brown TB, 2020, Arxiv, DOI [arXiv:2005.14165, DOI 10.48550/ARXIV.2005.14165, 10.48550/arXiv.2005.14165]
[9]   On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation [J].
Bach, Sebastian ;
Binder, Alexander ;
Montavon, Gregoire ;
Klauschen, Frederick ;
Mueller, Klaus-Robert ;
Samek, Wojciech .
PLOS ONE, 2015, 10 (07)
[10]  
Baevski A, 2020, ADV NEUR IN, V33