Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

被引:0
作者
Abdullah, Salinna [1 ]
Zamani, Majid [1 ,2 ]
Demosthenous, Andreas [1 ]
机构
[1] UCL, Dept Elect & Elect Engn, London WC1E 7JE, England
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, England
来源
IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2024年 / 5卷
基金
英国工程与自然科学研究理事会;
关键词
Deep neural network; digital circuits; field programmable gate array (FPGA); mapping; masking; multi-target learning; speech enhancement; structured pruning; ternary quantisation; PROCESSOR;
D O I
10.1109/OJCAS.2024.3389100
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88 similar to mm(2) and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
引用
收藏
页码:141 / 152
页数:12
相关论文
共 33 条
[1]   Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask [J].
Abdullah, Salinna ;
Zamani, Majid ;
Demosthenous, Andreas .
IEEE ACCESS, 2021, 9 :24350-24362
[2]  
Bohan Yang, 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), P2464, DOI 10.1109/CECNet.2012.6201840
[3]   Efficient Hardware Implementation of DNN-Based Speech Enhancement Algorithm With Precise Sigmoid Activation Function [J].
Chiluveru, Samba Raju ;
Gyanendra ;
Chunarkar, Snehit ;
Tripathy, Manoj ;
Kaushik, Brajesh Kumar .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (11) :3461-3465
[4]  
Cosi P., 1999, P WORKSH AUD BAS SPE, P194
[5]  
Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830
[6]  
Garofolo J., 1993, Tech. Rep. NISTIR 4930
[7]   UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition [J].
Hao, Xiang ;
Su, Xiangdong ;
Wang, Zhiyu ;
Zhang, Hui ;
Batushiren .
INTERSPEECH 2019, 2019, :1786-1790
[8]   An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition [J].
Kadetotad, Deepak ;
Yin, Shihui ;
Berisha, Visar ;
Chakrabarti, Chaitali ;
Seo, Jae-sun .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (07) :1877-1887
[9]  
Kounovsky T, 2017, IEEE INT WORKSH ELEC
[10]  
Lee J, 2016, PROC EUR SOLID-STATE, P117, DOI 10.1109/ESSCIRC.2016.7598256