Attention Round for post-training quantization

被引：9

作者：

Diao, Huabin ^{[1
]}

Li, Gongyan ^{[2
]}

Xu, Shaoyun ^{[2
]}

Kong, Chao ^{[1
]}

Wang, Wei ^{[1
]}

机构：

[1] Anhui Polytech Univ, Beijing Middle Rd, Wuhu 241000, Anhui, Peoples R China

[2] Chinese Acad Sci, Inst Microelect, 3 Beituocheng West Rd, Beijing 100029, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 565卷

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Convolutional neural networks; Post-training quantization; Attention Round; Mixed precision;

D O I：

10.1016/j.neucom.2023.127012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Quantization methods for convolutional neural network models can be broadly categorized into post-training quantization (PTQ) and quantization aware training (QAT). While PTQ offers the advantage of requiring only a small portion of the data for quantization, the resulting quantized model may not be as effective as QAT. To address this limitation, this paper proposes a novel quantization function named Attention Round. Unlike traditional quantization function that map 32 bit floating-point value w to nearby quantization levels, Attention Round allows w to be mapped to all possible quantization levels in the entire quantization space, expanding the quantization optimization space. The possibilities of mapping w to different quantization levels are inversely correlated with the distance between w and the quantization levels, regulated by a Gaussian decay function. Furthermore, to tackle the challenge of mixed precision quantization, this paper introduces a lossy coding length measure to assign quantization precision to different layers of the model, eliminating the need for solving a combinatorial optimization problem. Experimental evaluations on various models demonstrate the effectiveness of the proposed method. Notably, for ResNet18 and MobileNetV2, the PTQ approach achieves comparable quantization performance to QAT while utilizing only 1024 training data and 10 min for the quantization process.

引用

页数：10

共 52 条

[1] Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification [J].

Al-Hami, Mo'taz ;

Pietron, Marcin ;

Casas, Raul ;

Wielgosz, Maciej .

NEURAL PROCESSING LETTERS, 2020, 51 (01) :105-127

[2]

[Anonymous], 2021, arXiv

[3]

Banner R, 2019, ADV NEUR IN, V32

[4] ZeroQ: A Novel Zero Shot Quantization Framework [J].

Cai, Yaohui ;

Yao, Zhewei ;

Dong, Zhen ;

Gholami, Amir ;

Mahoney, Michael W. ;

Keutzer, Kurt .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :13166-13175

[5] Progressive Differentiable Architecture Search: Bridging the Depth Gap between Search and Evaluation [J].

Chen, Xin ;

Xie, Lingxi ;

Wu, Jun ;

Tian, Qi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1294-1303

[6]

Choi J, 2018, Arxiv, DOI arXiv:1805.06085

[7] Low-bit Quantization of Neural Networks for Efficient Inference [J].

Choukroun, Yoni ;

Kravchik, Eli ;

Yang, Fan ;

Kisilev, Pavel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :3009-3018

[8]

Courbariaux M, 2015, ADV NEUR IN, V28

[9]

Cover T. M., 1999, Elements of information theory

[10] NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm [J].

Dai, Xiaoliang ;

Yin, Hongxu ;

Jha, Niraj K. .

IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (10) :1487-1497

← 1 2 3 4 5 6 →