ILD-MPQ: Learning-Free Mixed-Precision Quantization with Inter-Layer Dependency Awareness

被引:0
|
作者
Xu, Ruge [1 ]
Duan, Qiang [2 ]
Chen, Qibin [2 ]
Guo, Xinfei [1 ]
机构
[1] Shanghai Jiao Tong Univ, Univ Michigan Shanghai Jiao Tong Univ Joint Inst, Shanghai, Peoples R China
[2] Inspur Acad Sci & Technol, Jinan, Peoples R China
关键词
Edge AI; Mixed-precision Quantization; Interlayer Dependency;
D O I
10.1109/AICAS59952.2024.10595945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the increasing adoption of mixed-precision quantization (MPQ) on edge AI devices, deep neural networks (DNNs) can achieve a satisfactory balance between accuracy and efficiency. However, many existing MPQ methods assumed inter-layer independence in DNNs and focus on optimizing bit-width schemes at the single layer level, leading to an additional loss of accuracy. Recently, several work looked into the inter-layer dependency and applied it in finding optimal MPQ schemes. These work either relied on leaning-based solutions that gave less explanations or missed the empirical validation of various heuristics. In this paper, we dig into the factors that lead to the inter-layer dependency and propose a learning-free inter-layer dependency-aware search method using the NSGA-II algorithm, leveraging a novel per-layer influence metric. The evaluation results across MobileNetV2 and ResNet50 models demonstrate that the proposed method enhances the efficiency of post-training quantization (PTQ) models by 8.7%similar to 65.3% compared to state-of-the-art learning-free approaches, and guarantees a loss of model efficiency within 4.0%similar to 8.9% while reducing time costs by 90% compared to learning-based approaches, all under similar hardware consumption constraints.
引用
收藏
页码:512 / 516
页数:5
相关论文
共 1 条
  • [1] Edge-MPQ: Layer-Wise Mixed-Precision Quantization With Tightly Integrated Versatile Inference Units for Edge Computing
    Zhao, Xiaotian
    Xu, Ruge
    Gao, Yimin
    Verma, Vaibhav
    Stan, Mircea R.
    Guo, Xinfei
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (11) : 2504 - 2519