OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks

被引:0
作者
Xiang, Jingyang [1 ]
Chen, Zuohui [2 ]
Li, Siqi [1 ]
Wu, Qing [3 ]
Liu, Yong [1 ,4 ]
机构
[1] Zhejiang Univ, APRIL Lab, Hangzhou, Peoples R China
[2] Zhejiang Univ Technol, Hangzhou, Peoples R China
[3] Hangzhou Dianzi Univ, Coll Comp Sci, Hangzhou, Peoples R China
[4] Zhejiang Univ, Huzhou Inst, Hangzhou, Peoples R China
来源
COMPUTER VISION - ECCV 2024, PT XXXIII | 2025年 / 15091卷
关键词
Binary Neural Networks; Silent Weights; Adaptive Gradient Scaling; Silence Awareness Decaying;
D O I
10.1007/978-3-031-73414-4_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as "silent weights", which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify "silent weights" by tracking weight flipping state, and apply an additional penalty to "silent weights" to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/ OvSW.
引用
收藏
页码:1 / 18
页数:18
相关论文
共 61 条
[1]  
Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[2]  
Brock Andrew, 2021, P MACHINE LEARNING R, V139
[3]  
Bulat A, 2019, Arxiv, DOI arXiv:1909.13863
[4]   A Learning Framework for n-Bit Quantized Neural Networks Toward FPGAs [J].
Chen, Jun ;
Liu, Liang ;
Liu, Yong ;
Zeng, Xianfang .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) :1067-1081
[5]  
Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830
[6]   Regularizing Activation Distribution for Training Binarized Deep Networks [J].
Ding, Ruizhou ;
Chin, Ting-Wu ;
Liu, Zeye ;
Marculescu, Diana .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11400-11409
[7]   RepVGG: Making VGG-style ConvNets Great Again [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Ma, Ningning ;
Han, Jungong ;
Ding, Guiguang ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13728-13737
[8]   TransNAS-Bench-101: Improving transferability and Generalizability of Cross-Task Neural Architecture Search [J].
Duan, Yawen ;
Chen, Xin ;
Xu, Hang ;
Chen, Zewei ;
Liang, Xiaodan ;
Zhang, Tong ;
Li, Zhenguo .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5247-5256
[9]  
Feng J., 2021, BOLT
[10]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587