LOW BIT NEURAL NETWORK QUANTIZATION FOR SPEAKER VERIFICATION

被引:0
作者
Wang, Haoyu [1 ]
Liu, Bei [1 ]
Wu, Yifei [1 ]
Chen, Zhengyang [1 ]
Qian, Yanmin [1 ]
机构
[1] Shanghai Jiao Tong Univ, AI Inst, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence,X LANCE Lab, Shanghai, Peoples R China
来源
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW | 2023年
关键词
speaker verification; neural network quantization; model compression; mixed precision quantization;
D O I
10.1109/ICASSPW59220.2023.10193337
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
With the continuous development of deep neural networks (DNN) in recent years, the performance of speaker verification systems has been significantly improved with the application of Deeper ResNet architectures. However, these deeper models occupy more storage space in application. In this paper, we adopt Alternate Direction Methods of Multipliers (ADMM) to realize low-bit quantization on the original ResNets. Our goal is to explore the maximal quantization compression without evident degradation in model performance. We implement different uniform quantization for each convolution layer to achieve mixed precision quantization of the entire model. Moreover, the impact of batch normalization layers in ADMM training and layer sensibility to quantization are explored. In our experiments, the 8 bit quantized ResNet152 achieved comparable results to the full-precision one on Voxceleb 1, with only 45% of original model size. Besides, we find that shallow convolution layers are more sensitive to quantization. In addition, experimental results indicate that the model performance will be severely degraded if batch normalization layers are integrated into the convolution layer before the quantization training starts.
引用
收藏
页数:5
相关论文
共 26 条
[1]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[2]   An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections [J].
Cheng, Yu ;
Yu, Felix X. ;
Feris, Rogerio S. ;
Kumar, Sanjiv ;
Choudhary, Alok ;
Chang, Shih-Fu .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2857-2865
[3]  
Chung JS, 2018, INTERSPEECH, P1086
[4]   ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].
Deng, Jiankang ;
Guo, Jia ;
Xue, Niannan ;
Zafeiriou, Stefanos .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694
[5]   ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].
Desplanques, Brecht ;
Thienpondt, Jenthe ;
Demuynck, Kris .
INTERSPEECH 2020, 2020, :3830-3834
[6]  
Dong Zhen, 2020, ADV NEUR IN, V33
[7]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[8]  
Hinton G, 2015, Arxiv, DOI arXiv:1503.02531
[9]   Analysis of Neural Networks with Redundancy [J].
Izui, Yoshio ;
Pentland, Alex .
NEURAL COMPUTATION, 1990, 2 (02) :226-238
[10]  
Ko T, 2017, INT CONF ACOUST SPEE, P5220, DOI 10.1109/ICASSP.2017.7953152