Acceleration-aware, Retraining-free Evolutionary Pruning for Automated Fitment of Deep Learning Models on Edge Devices

被引:1
作者
Dutta, Jeet [1 ]
Dey, Swarnava [1 ]
Mukherjee, Arijit [1 ]
Pal, Arpan [1 ]
机构
[1] TCS Res, Kolkata, W Bengal, India
来源
SECOND INTERNATIONAL CONFERENCE ON AIML SYSTEMS 2022 | 2022年
关键词
neural networks; deep learning; pruning; nas; edge;
D O I
10.1145/3564121.3564133
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep Learning architectures used in computer vision, natural language and speech processing, unsupervised clustering, etc. have become highly complex and application-specific in recent times. Despite existing automated feature engineering techniques, building such complex models still requires extensive domain knowledge or a huge infrastructure for employing techniques such as Neural Architecture Search (NAS). Further, many industrial applications need in-premises decision-making close to sensors, thus making deployment of deep learning models on edge devices a desirable and often necessary option. Instead of freshly designing application-specific Deep Learning models, the transformation of already built models can achieve faster time to market and cost reduction. In this work, we present an efficient re-training-free model compression method that searches for the best hyper-parameters to reduce the model size and latency without losing any accuracy. Moreover, our proposed method takes into account any drop in accuracy due to hardware acceleration, when a Deep Neural Network is executed on accelerator hardware.
引用
收藏
页数:10
相关论文
empty
未找到相关数据