共 1 条
ILD-MPQ: Learning-Free Mixed-Precision Quantization with Inter-Layer Dependency Awareness
被引:0
|作者:
Xu, Ruge
[1
]
Duan, Qiang
[2
]
Chen, Qibin
[2
]
Guo, Xinfei
[1
]
机构:
[1] Shanghai Jiao Tong Univ, Univ Michigan Shanghai Jiao Tong Univ Joint Inst, Shanghai, Peoples R China
[2] Inspur Acad Sci & Technol, Jinan, Peoples R China
关键词:
Edge AI;
Mixed-precision Quantization;
Interlayer Dependency;
D O I:
10.1109/AICAS59952.2024.10595945
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
With the increasing adoption of mixed-precision quantization (MPQ) on edge AI devices, deep neural networks (DNNs) can achieve a satisfactory balance between accuracy and efficiency. However, many existing MPQ methods assumed inter-layer independence in DNNs and focus on optimizing bit-width schemes at the single layer level, leading to an additional loss of accuracy. Recently, several work looked into the inter-layer dependency and applied it in finding optimal MPQ schemes. These work either relied on leaning-based solutions that gave less explanations or missed the empirical validation of various heuristics. In this paper, we dig into the factors that lead to the inter-layer dependency and propose a learning-free inter-layer dependency-aware search method using the NSGA-II algorithm, leveraging a novel per-layer influence metric. The evaluation results across MobileNetV2 and ResNet50 models demonstrate that the proposed method enhances the efficiency of post-training quantization (PTQ) models by 8.7%similar to 65.3% compared to state-of-the-art learning-free approaches, and guarantees a loss of model efficiency within 4.0%similar to 8.9% while reducing time costs by 90% compared to learning-based approaches, all under similar hardware consumption constraints.
引用
收藏
页码:512 / 516
页数:5
相关论文