Towards Fully Quantized Neural Networks For Speech Enhancement

被引:2
作者
Cohen, Elad [1 ]
Habi, Hai Victor [1 ]
Netzer, Arnon [1 ]
机构
[1] Sony Semicond Israel, Hod Hasharon, Israel
来源
INTERSPEECH 2023 | 2023年
关键词
Speech Enhancement; Quantization; CNN;
D O I
10.21437/Interspeech.2023-883
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning models have shown state-of-the-art results in speech enhancement. However, deploying such models on an eight-bit integer-only device is challenging. In this work, we analyze the gaps in deploying a vanilla quantization-aware training method for speech enhancement, revealing two significant observations. First, quantization mainly affects signals with a high input Signal-to-Noise Ratio (SNR). Second, quantizing the model's input and output shows major performance degradation. Based on our analysis, we propose Fully Quantized Speech Enhancement (FQSE), a new quantization-aware training method that closes these gaps and enables eight-bit integeronly quantization. FQSE introduces data augmentation to mitigate the quantization effect on high SNR. Additionally, we add an input splitter and a residual quantization block to the model to overcome the error of the input-output quantization. We show that FQSE closes the performance gaps induced by eight-bit quantization.
引用
收藏
页码:181 / 185
页数:5
相关论文
共 25 条
[1]   Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask [J].
Abdullah, Salinna ;
Zamani, Majid ;
Demosthenous, Andreas .
IEEE ACCESS, 2021, 9 :24350-24362
[2]  
Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
[3]   Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation [J].
Chen, Jingjing ;
Mao, Qirong ;
Liu, Dong .
INTERSPEECH 2020, 2020, :2642-2646
[4]  
Cohen E., 2023, Towards fully quantized neural networks for speech enhancement
[5]  
Cosentino J, 2020, Arxiv, DOI arXiv:2005.11262
[6]  
Detlefsen N. S., 2022, J. Open Sour. Softw., V7, P4101, DOI [DOI 10.21105/JOSS.04101, 10.21105/joss.04101]
[7]  
Esser S.K., 2020, INT C LEARN REPR
[8]   METRICGAN-U: UNSUPERVISED SPEECH ENHANCEMENT/ DEREVERBERATION BASED ONLY ON NOISY/ REVERBERATED SPEECH [J].
Fu, Szu-Wei ;
Yu, Cheng ;
Hung, Kuo-Hsuan ;
Ravanelli, Mirco ;
Tsao, Yu .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7412-7416
[9]  
Gaikwad S.K., 2010, Int. J. Comput. Appl., V10, P16, DOI DOI 10.5120/1462-1976
[10]  
Gholami A., 2021, arXiv, DOI DOI 10.48550/ARXIV.2103.13630