Delving into the Estimation Shift of Batch Normalization in a Network

被引:19
作者
Huang, Lei [1 ]
Zhou, Yi [2 ]
Wang, Tian [1 ]
Luo, Jic [1 ]
Liu, Xianglong [1 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, SKLSDE, Beijing, Peoples R China
[2] Southeast Univ, MOE Key Lab Comp Network & Informat Integrat, Nanjing, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.00084
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Batch normalization (BN) is a milestone technique in deep learning. It normalizes the activation using mini-batch statistics during training but the estimated population statistics during inference. This paper focuses on investigating the estimation of population statistics. We define the estimation shift magnitude of BN to quantitatively measure the difference between its estimated population statistics and expected ones. Our primary observation is that the estimation shift can be accumulated due to the stack of BN in a network, which has detriment effects for the test performance. We further find a batch-free normalization (BFN) can block such an accumulation of estimation shift. These observations motivate our design of XBNBlock that replace one BN with BFN in the bottleneck block of residual-style networks. Experiments on the ImageNet and COCO benchmarks show that XBNBlock consistently improves the performance of different architectures, including ResNet and ResNeXt, by a significant margin and seems to be more robust to distribution shift.
引用
收藏
页码:753 / 762
页数:10
相关论文
共 56 条
[1]  
[Anonymous], 2018, NeurIPS
[2]  
[Anonymous], 2021, WACV, DOI DOI 10.1109/WACV48630.2021.00054
[3]  
[Anonymous], 2018, NeurIPS
[4]  
Ba J. L., 2016, Advances in Neural Information Processing Systems (NeurIPS), P1
[5]  
BJORCK J, 2018, NEURIPS, V31
[6]  
Brock A., 2021, ICML
[7]  
Bronskill J., 2020, ICML
[8]  
Chiley Vitaliy, 2019, NEURIPS
[9]  
Desjardins G., 2015, NEURIPS
[10]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]