Vertical Layering of Quantized Neural Networks for Heterogeneous Inference

被引:1
作者
Wu, Hai [1 ]
He, Ruifei [1 ]
Tan, Haoru [1 ]
Qi, Xiaojuan [1 ]
Huang, Kaibin [1 ]
机构
[1] Univ Hong Kong, Dept Elect & Elect Engn, Pok Fu Lam, Hong Kong, Peoples R China
关键词
Bit-width scalable network; layered coding; multi-objective optimization; quantization-aware training;
D O I
10.1109/TPAMI.2023.3319045
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although considerable progress has been obtained in neural network quantization for efficient inference, existing methods are not scalable to heterogeneous devices as one dedicated model needs to be trained, transmitted, and stored for one specific hardware setting, incurring considerable costs in model training and maintenance. In this paper, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. It represents weights as a group of bits (i.e., vertical layers) organized from the most significant bit (also called the basic layer) to less significant bits (i.e., enhance layers). Hence, a neural network with an arbitrary quantization precision can be obtained by adding corresponding enhance layers to the basic layer. However, we empirically find that models obtained with existing quantization methods suffer severe performance degradation if they are adapted to vertical-layered weight representation. To this end, we propose a simple once quantization-aware training (QAT) scheme for obtaining high-performance vertical-layered models. Our design incorporates a cascade downsampling mechanism with the multi-objective optimization employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all networks. After the model is trained, to construct a vertical-layered network, the lowest bit-width quantized weights become the basic layer, and every bit dropped along the downsampling process act as an enhance layer. Our design is extensively evaluated on CIFAR-100 and ImageNet datasets. Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width.
引用
收藏
页码:15964 / 15978
页数:15
相关论文
共 50 条
[21]   Spectrum Sharing in Heterogeneous Networks Based on Multi-Objective Optimization [J].
Zhu, Jiajia ;
Wu, Runze ;
Tang, Liangrui ;
Ji, Shiyu .
2016 19TH INTERNATIONAL SYMPOSIUM ON WIRELESS PERSONAL MULTIMEDIA COMMUNICATIONS (WPMC), 2016,
[22]   A Novel Performance Tradeoff in Heterogeneous Networks: A Multi-Objective Approach [J].
Mili, Mohammad Robat ;
Khalili, Ata ;
Ng, Derrick Wing Kwan ;
Steendam, Heidi .
IEEE WIRELESS COMMUNICATIONS LETTERS, 2019, 8 (05) :1402-1405
[23]   Multiobjective Oriented Task Scheduling in Heterogeneous Mobile Edge Computing Networks [J].
Li, Jinglei ;
Shang, Ying ;
Qin, Meng ;
Yang, Qinghai ;
Cheng, Nan ;
Gao, Wen ;
Kwak, Kyung Sup .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (08) :8955-8966
[24]   GRNMOPT: Inference of gene regulatory networks based on a multi-objective optimization approach [J].
Dong, Heng ;
Ma, Baoshan ;
Meng, Yangyang ;
Wu, Yiming ;
Liu, Yongjing ;
Zeng, Tao ;
Huang, Jinyan .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2024, 113
[25]   Artificial neural networks for sustainable development: a critical review [J].
Gue, Ivan Henderson V. ;
Ubando, Aristotle T. ;
Tseng, Ming-Lang ;
Tan, Raymond R. .
CLEAN TECHNOLOGIES AND ENVIRONMENTAL POLICY, 2020, 22 (07) :1449-1465
[26]   Hybrid multiobjective evolutionary design for artificial neural networks [J].
Goh, Chi-Keong ;
Teoh, Eu-Jin ;
Tan, Kay Chen .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (09) :1531-1548
[27]   Evolutionary Shallowing Deep Neural Networks at Block Levels [J].
Zhou, Yao ;
Yen, Gary G. ;
Yi, Zhang .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) :4635-4647
[28]   Artificial Neural Networks Based Optimization Techniques: A Review [J].
Abdolrasol, Maher G. M. ;
Hussain, S. M. Suhail ;
Ustun, Taha Selim ;
Sarker, Mahidur R. ;
Hannan, Mahammad A. ;
Mohamed, Ramizi ;
Ali, Jamal Abd ;
Mekhilef, Saad ;
Milad, Abdalrhman .
ELECTRONICS, 2021, 10 (21)
[29]   Advances in neural networks and potential for their application to steel metallurgy [J].
Smith, J. L. .
MATERIALS SCIENCE AND TECHNOLOGY, 2020, 36 (17) :1805-1819
[30]   Integrated location and routing for cold chain logistics networks with heterogeneous customer demand [J].
Rahmanifar, Golman ;
Mohammadi, Mostafa ;
Golabian, Mohammad ;
Sherafat, Ali ;
Hajiaghaei-Keshteli, Mostafa ;
Fusco, Gaetano ;
Colombaroni, Chiara .
JOURNAL OF INDUSTRIAL INFORMATION INTEGRATION, 2024, 38