Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

被引：0

作者：

Abdullah, Salinna ^{[1
]}

Zamani, Majid ^{[1
,2
]}

Demosthenous, Andreas ^{[1
]}

机构：

[1] UCL, Dept Elect & Elect Engn, London WC1E 7JE, England

[2] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, England

来源：

IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2024年 / 5卷

基金：

英国工程与自然科学研究理事会;

关键词：

Deep neural network; digital circuits; field programmable gate array (FPGA); mapping; masking; multi-target learning; speech enhancement; structured pruning; ternary quantisation; PROCESSOR;

D O I：

10.1109/OJCAS.2024.3389100

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88 similar to mm(2) and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.

引用

页码：141 / 152

页数：12

共 33 条

[1] Towards More Efficient DNN-Based Speech Enhancement Using Quantized Correlation Mask [J].

Abdullah, Salinna ;

Zamani, Majid ;

Demosthenous, Andreas .

IEEE ACCESS, 2021, 9 :24350-24362

[2]

Bohan Yang, 2012, 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), P2464, DOI 10.1109/CECNet.2012.6201840

[3] Efficient Hardware Implementation of DNN-Based Speech Enhancement Algorithm With Precise Sigmoid Activation Function [J].

Chiluveru, Samba Raju ;

Gyanendra ;

Chunarkar, Snehit ;

Tripathy, Manoj ;

Kaushik, Brajesh Kumar .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (11) :3461-3465

[4]

Cosi P., 1999, P WORKSH AUD BAS SPE, P194

[5]

Courbariaux M, 2016, Arxiv, DOI arXiv:1602.02830

[6]

Garofolo J., 1993, Tech. Rep. NISTIR 4930

[7] UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition [J].

Hao, Xiang ;

Su, Xiangdong ;

Wang, Zhiyu ;

Zhang, Hui ;

Batushiren .

INTERSPEECH 2019, 2019, :1786-1790

[8] An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition [J].

Kadetotad, Deepak ;

Yin, Shihui ;

Berisha, Visar ;

Chakrabarti, Chaitali ;

Seo, Jae-sun .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (07) :1877-1887

[9]

Kounovsky T, 2017, IEEE INT WORKSH ELEC

[10]

Lee J, 2016, PROC EUR SOLID-STATE, P117, DOI 10.1109/ESSCIRC.2016.7598256

← 1 2 3 4 →