Fully Quantized Neural Networks for Audio Source Separation

被引:0
作者
Cohen, Elad [1 ]
Habi, Hai Victor [1 ]
Peretz, Reuven [1 ]
Netzer, Arnon [1 ]
机构
[1] Sony Semicond Israel, IL-4524079 Hod Hasharon, Israel
来源
IEEE OPEN JOURNAL OF SIGNAL PROCESSING | 2024年 / 5卷
关键词
Quantization (signal); Task analysis; Source separation; Analytical models; Training; Degradation; Computational modeling; quantization; DNN; SDR; compression; knowledge distillation;
D O I
10.1109/OJSP.2024.3425287
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep neural networks have shown state-of-the-art results in audio source separation tasks in recent years. However, deploying such networks, especially on edge devices, is challenging due to memory and computation requirements. In this work, we focus on quantization, a leading approach for addressing these challenges. We start with a theoretical and empirical analysis of the signal-to-distortion ratio (SDR) in the presence of quantization noise, which presents a fundamental limitation in audio source separation tasks. These analyses show that quantization noise mainly affects performance when the model produces high SDRs. We empirically validate the theoretical insights and illustrate them on audio source separation models. In addition, the empirical analysis shows a high sensitivity to activations quantization, especially to the network's input and output signals. Following the analysis, we propose Fully Quantized Source Separation (FQSS), a quantization-aware training (QAT) method for audio source separation tasks. FQSS introduces a novel loss function based on knowledge distillation that considers quantization-sensitive samples during training and handles the quantization noise of the input and output signals. We validate the efficiency of our method in both time and frequency domains. Finally, we apply FQSS to several architectures (CNNs, LSTMs, and Transformers) and show negligible degradation compared to the full-precision baseline models.
引用
收藏
页码:926 / 933
页数:8
相关论文
共 32 条
  • [1] [Anonymous], 2010, Int. J. Comput. Appl., V10, P16
  • [2] [Anonymous], 2021, International Journal of Computer Vision, V129, P1789
  • [3] Bengio Y, 2013, Arxiv, DOI arXiv:1308.3432
  • [4] Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
    Chen, Jingjing
    Mao, Qirong
    Liu, Dong
    [J]. INTERSPEECH 2020, 2020, : 2642 - 2646
  • [5] Towards Fully Quantized Neural Networks For Speech Enhancement
    Cohen, Elad
    Habi, Hai Victor
    Netzer, Arnon
    [J]. INTERSPEECH 2023, 2023, : 181 - 185
  • [6] Cosentino J, 2020, Arxiv, DOI arXiv:2005.11262
  • [7] Defossez N., 2020, Music sourceseparation in the waveform domain
  • [8] Esser S. K., 2020, INT C LEARN REPR
  • [9] Gaikwad B. W., A review on speechrecognition technique
  • [10] Gholami Amir, 2021, Low-Power Computer Vision