model compression;
pruning;
quantization;
parallelization;
neural network search;
methodology;
D O I:
10.1109/DAC56929.2023.10247892
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
As the range of neural network applications has exploded, various model compression techniques have been developed to increase the accuracy of neural networks under the resource constraints given by the hardware platform and the performance constraints required by users. In this perspective paper, the current status and future prospects of individual techniques are briefly summarized. And it presents the importance of understanding the characteristics of the hardware platform and the systematic methodology of applying these techniques harmoniously.