Towards Fully Quantized Neural Networks For Speech Enhancement

被引：2

作者：

Cohen, Elad ^{[1
]}

Habi, Hai Victor ^{[1
]}

Netzer, Arnon ^{[1
]}

机构：

[1] Sony Semicond Israel, Hod Hasharon, Israel

来源：

INTERSPEECH 2023 | 2023年

关键词：

Speech Enhancement; Quantization; CNN;

D O I：

10.21437/Interspeech.2023-883

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Deep learning models have shown state-of-the-art results in speech enhancement. However, deploying such models on an eight-bit integer-only device is challenging. In this work, we analyze the gaps in deploying a vanilla quantization-aware training method for speech enhancement, revealing two significant observations. First, quantization mainly affects signals with a high input Signal-to-Noise Ratio (SNR). Second, quantizing the model's input and output shows major performance degradation. Based on our analysis, we propose Fully Quantized Speech Enhancement (FQSE), a new quantization-aware training method that closes these gaps and enables eight-bit integeronly quantization. FQSE introduces data augmentation to mitigate the quantization effect on high SNR. Additionally, we add an input splitter and a residual quantization block to the model to overcome the error of the input-output quantization. We show that FQSE closes the performance gaps induced by eight-bit quantization.

引用

页码：181 / 185

页数：5

共 25 条

[1] Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask [J].

Abdullah, Salinna ;

Zamani, Majid ;

Demosthenous, Andreas .

IEEE ACCESS, 2021, 9 :24350-24362

[2]

Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432

[3] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation [J].

Chen, Jingjing ;

Mao, Qirong ;

Liu, Dong .

INTERSPEECH 2020, 2020, :2642-2646

[4]

Cohen E., 2023, Towards fully quantized neural networks for speech enhancement

[5]

Cosentino J, 2020, Arxiv, DOI arXiv:2005.11262

[6]

Detlefsen N. S., 2022, J. Open Sour. Softw., V7, P4101, DOI [DOI 10.21105/JOSS.04101, 10.21105/joss.04101]

[7]

Esser S.K., 2020, INT C LEARN REPR

[8] METRICGAN-U: UNSUPERVISED SPEECH ENHANCEMENT/ DEREVERBERATION BASED ONLY ON NOISY/ REVERBERATED SPEECH [J].

Fu, Szu-Wei ;

Yu, Cheng ;

Hung, Kuo-Hsuan ;

Ravanelli, Mirco ;

Tsao, Yu .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :7412-7416

[9]

Gaikwad S.K., 2010, Int. J. Comput. Appl., V10, P16, DOI DOI 10.5120/1462-1976

[10]

Gholami A., 2021, arXiv, DOI DOI 10.48550/ARXIV.2103.13630

← 1 2 3 →