Acceleration-aware, Retraining-free Evolutionary Pruning for Automated Fitment of Deep Learning Models on Edge Devices

被引：1

作者：

Dutta, Jeet ^{[1
]}

Dey, Swarnava ^{[1
]}

Mukherjee, Arijit ^{[1
]}

Pal, Arpan ^{[1
]}

机构：

[1] TCS Res, Kolkata, W Bengal, India

来源：

SECOND INTERNATIONAL CONFERENCE ON AIML SYSTEMS 2022 | 2022年

关键词：

neural networks; deep learning; pruning; nas; edge;

D O I：

10.1145/3564121.3564133

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep Learning architectures used in computer vision, natural language and speech processing, unsupervised clustering, etc. have become highly complex and application-specific in recent times. Despite existing automated feature engineering techniques, building such complex models still requires extensive domain knowledge or a huge infrastructure for employing techniques such as Neural Architecture Search (NAS). Further, many industrial applications need in-premises decision-making close to sensors, thus making deployment of deep learning models on edge devices a desirable and often necessary option. Instead of freshly designing application-specific Deep Learning models, the transformation of already built models can achieve faster time to market and cost reduction. In this work, we present an efficient re-training-free model compression method that searches for the best hyper-parameters to reduce the model size and latency without losing any accuracy. Moreover, our proposed method takes into account any drop in accuracy due to hardware acceleration, when a Deep Neural Network is executed on accelerator hardware.

引用

页数：10