Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

被引:0
作者
Abdullah, Salinna [1 ]
Zamani, Majid [1 ,2 ]
Demosthenous, Andreas [1 ]
机构
[1] UCL, Dept Elect & Elect Engn, London WC1E 7JE, England
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, England
来源
IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2024年 / 5卷
基金
英国工程与自然科学研究理事会;
关键词
Deep neural network; digital circuits; field programmable gate array (FPGA); mapping; masking; multi-target learning; speech enhancement; structured pruning; ternary quantisation; PROCESSOR;
D O I
10.1109/OJCAS.2024.3389100
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88 similar to mm(2) and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
引用
收藏
页码:141 / 152
页数:12
相关论文
共 33 条
[11]  
Lee J, 2020, ASIAPAC SIGN INFO PR, P739
[12]   A 2.17-mW Acoustic DSP Processor With CNN-FFT Accelerators for Intelligent Hearing Assistive Devices [J].
Lee, Yu-Chi ;
Chi, Tai-Shih ;
Yang, Chia-Hsiang .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (08) :2247-2258
[13]   A 1.5 mW Programmable Acoustic Signal Processor for Hearing Assistive Devices With Speech Intelligibility Enhancement [J].
Lin, Yung-Jen ;
Lee, Yu-Chi ;
Liu, Hao-Min ;
Chiueh, Herming ;
Chi, Tai-Shih ;
Yang, Chia-Hsiang .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (12) :4984-4993
[14]   Long-Term SNR Estimation of Speech Signals in Known and Unknown Channel Conditions [J].
Papadopoulos, Pavlos ;
Tsiartas, Andreas ;
Narayanan, Shrikanth .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) :2495-2506
[15]   Weighted-Entropy-based Quantization for Deep Neural Networks [J].
Park, Eunhyeok ;
Ahn, Junwhan ;
Yoo, Sungjoo .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7197-7205
[16]   Improving GANs for Speech Enhancement [J].
Phan, Huy ;
McLoughlin, Ian V. ;
Pham, Lam ;
Chen, Oliver Y. ;
Koch, Philipp ;
De Vos, Maarten ;
Mertins, Alfred .
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 :1700-1704
[17]   A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks [J].
Price, Michael ;
Glass, James ;
Chandrakasan, Anantha P. .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) :66-75
[18]  
Rix AW, 2001, INT CONF ACOUST SPEE, P749, DOI 10.1109/ICASSP.2001.941023
[19]   A compact digital gamma-tone filter processor [J].
Rojo-Hernandez, Areli ;
Sanchez-Rivera, Giovanny ;
Avalos-Ochoa, Gerardo ;
Perez-Meana, Hector ;
Smith, Leslie S. .
MICROPROCESSORS AND MICROSYSTEMS, 2016, 45 :216-225
[20]  
Saeed A., 2009, International Journal of Circuits, Systems and Signal Processing, V3, P103