Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

被引:0
作者
Abdullah, Salinna [1 ]
Zamani, Majid [1 ,2 ]
Demosthenous, Andreas [1 ]
机构
[1] UCL, Dept Elect & Elect Engn, London WC1E 7JE, England
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, England
来源
IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2024年 / 5卷
基金
英国工程与自然科学研究理事会;
关键词
Deep neural network; digital circuits; field programmable gate array (FPGA); mapping; masking; multi-target learning; speech enhancement; structured pruning; ternary quantisation; PROCESSOR;
D O I
10.1109/OJCAS.2024.3389100
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88 similar to mm(2) and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
引用
收藏
页码:141 / 152
页数:12
相关论文
共 33 条
  • [1] Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask
    Abdullah, Salinna
    Zamani, Majid
    Demosthenous, Andreas
    [J]. IEEE ACCESS, 2021, 9 : 24350 - 24362
  • [2] Bohan Yang, 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), P2464, DOI 10.1109/CECNet.2012.6201840
  • [3] Efficient Hardware Implementation of DNN-Based Speech Enhancement Algorithm With Precise Sigmoid Activation Function
    Chiluveru, Samba Raju
    Gyanendra
    Chunarkar, Snehit
    Tripathy, Manoj
    Kaushik, Brajesh Kumar
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (11) : 3461 - 3465
  • [4] Cosi P., 1999, P WORKSH AUD BAS SPE, P194
  • [5] Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830
  • [6] Garofolo J., 1993, Tech. Rep. NISTIR 4930
  • [7] UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition
    Hao, Xiang
    Su, Xiangdong
    Wang, Zhiyu
    Zhang, Hui
    Batushiren
    [J]. INTERSPEECH 2019, 2019, : 1786 - 1790
  • [8] An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition
    Kadetotad, Deepak
    Yin, Shihui
    Berisha, Visar
    Chakrabarti, Chaitali
    Seo, Jae-sun
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (07) : 1877 - 1887
  • [9] Kounovsky T, 2017, IEEE INT WORKSH ELEC
  • [10] Lee J, 2016, PROC EUR SOLID-STATE, P117, DOI 10.1109/ESSCIRC.2016.7598256