Vertical Layering of Quantized Neural Networks for Heterogeneous Inference

被引：1

作者：

Wu, Hai ^{[1
]}

He, Ruifei ^{[1
]}

Tan, Haoru ^{[1
]}

Qi, Xiaojuan ^{[1
]}

Huang, Kaibin ^{[1
]}

机构：

[1] Univ Hong Kong, Dept Elect & Elect Engn, Pok Fu Lam, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 12期

关键词：

Bit-width scalable network; layered coding; multi-objective optimization; quantization-aware training;

D O I：

10.1109/TPAMI.2023.3319045

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. It represents weights as a group of bits (i.e., vertical layers) organized from the most significant bit (also called the basic layer) to less significant bits (i.e., enhance layers). Hence, a neural network with an arbitrary quantization precision can be obtained by adding corresponding enhance layers to the basic layer. However, we empirically find that models obtained with existing quantization methods suffer severe performance degradation if they are adapted to vertical-layered weight representation. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism with the multi-objective optimization employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. After the model is trained, to construct a vertical-layered network, the lowest bit-width quantized weights become the basic layer, and every bit dropped along the downsampling process act as an enhance layer. Our design is extensively evaluated on CIFAR-100 and ImageNet datasets. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width.

引用

页码：15964 / 15978

页数：15

共 50 条

[1] Inference of Quantized Neural Networks on Heterogeneous All-Programmable Devices
Preusser, Thomas B.
Gambardella, Giulio
Fraser, Nicholas
Blott, Michaela
PROCEEDINGS OF THE 2018 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2018, : 833 - 838
[2] Simulating quantized inference on convolutional neural networks
Finotti, Vitor
Albertini, Bruno
COMPUTERS & ELECTRICAL ENGINEERING, 2021, 95
[3] Simulating quantized inference on convolutional neural networks
Finotti, Vitor
Albertini, Bruno
Computers and Electrical Engineering, 2021, 95
[4] FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference
Ding, Ruizhou
Liu, Zeye
Chin, Ting-Wu
Marculescu, Diana
Blanton, R. D.
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[5] Quantized Deep Neural Networks for Energy Efficient Hardware-based Inference
Ding, Ruizhou
Liu, Zeye
Blanton, R. D.
Marculescu, Diana
2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 1 - 8
[6] An Enhanced Vertical Handover Based on Fuzzy Inference MADM Approach for Heterogeneous Networks
Aymen Ben Zineb
Mohamed Ayadi
Sami Tabbane
Arabian Journal for Science and Engineering, 2017, 42 : 3263 - 3274
[7] An Enhanced Vertical Handover Based on Fuzzy Inference MADM Approach for Heterogeneous Networks
Ben Zineb, Aymen
Ayadi, Mohamed
Tabbane, Sami
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2017, 42 (08) : 3263 - 3274
[8] QoS-Aware Scheduling of Heterogeneous Servers for Inference in Deep Neural Networks
Fang, Zhou
Yu, Tong
Mengshoel, Ole J.
Gupta, Rajesh K.
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2067 - 2070
[9] Link Inference via Heterogeneous Multi-view Graph Neural Networks
Xing, Yuying
Li, Zhao
Hui, Pengrui
Huang, Jiaming
Chen, Xia
Zhang, Long
Yu, Guoxian
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 698 - +
[10] An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks
Pochelu, Pierrick
Petiton, Serge G.
Conche, Bruno
2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5225 - 5232

← 1 2 3 4 5 →