Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

被引：0

作者：

Abdullah, Salinna ^{[1
]}

Zamani, Majid ^{[1
,2
]}

Demosthenous, Andreas ^{[1
]}

机构：

[1] UCL, Dept Elect & Elect Engn, London WC1E 7JE, England

[2] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, England

来源：

IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2024年 / 5卷

基金：

英国工程与自然科学研究理事会;

关键词：

Deep neural network; digital circuits; field programmable gate array (FPGA); mapping; masking; multi-target learning; speech enhancement; structured pruning; ternary quantisation; PROCESSOR;

D O I：

10.1109/OJCAS.2024.3389100

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88 similar to mm(2) and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.

引用

页码：141 / 152

页数：12

共 33 条

[11]

Lee J, 2020, ASIAPAC SIGN INFO PR, P739

[12] A 2.17-mW Acoustic DSP Processor With CNN-FFT Accelerators for Intelligent Hearing Assistive Devices [J].

Lee, Yu-Chi ;

Chi, Tai-Shih ;

Yang, Chia-Hsiang .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (08) :2247-2258

[13] A 1.5 mW Programmable Acoustic Signal Processor for Hearing Assistive Devices With Speech Intelligibility Enhancement [J].

Lin, Yung-Jen ;

Lee, Yu-Chi ;

Liu, Hao-Min ;

Chiueh, Herming ;

Chi, Tai-Shih ;

Yang, Chia-Hsiang .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (12) :4984-4993

[14] Long-Term SNR Estimation of Speech Signals in Known and Unknown Channel Conditions [J].

Papadopoulos, Pavlos ;

Tsiartas, Andreas ;

Narayanan, Shrikanth .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) :2495-2506

[15] Weighted-Entropy-based Quantization for Deep Neural Networks [J].

Park, Eunhyeok ;

Ahn, Junwhan ;

Yoo, Sungjoo .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :7197-7205

[16] Improving GANs for Speech Enhancement [J].

Phan, Huy ;

McLoughlin, Ian V. ;

Pham, Lam ;

Chen, Oliver Y. ;

Koch, Philipp ;

De Vos, Maarten ;

Mertins, Alfred .

IEEE SIGNAL PROCESSING LETTERS, 2020, 27 :1700-1704

[17] A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks [J].

Price, Michael ;

Glass, James ;

Chandrakasan, Anantha P. .

IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) :66-75

[18]

Rix AW, 2001, INT CONF ACOUST SPEE, P749, DOI 10.1109/ICASSP.2001.941023

[19] A compact digital gamma-tone filter processor [J].

Rojo-Hernandez, Areli ;

Sanchez-Rivera, Giovanny ;

Avalos-Ochoa, Gerardo ;

Perez-Meana, Hector ;

Smith, Leslie S. .

MICROPROCESSORS AND MICROSYSTEMS, 2016, 45 :216-225

[20]

Saeed A., 2009, International Journal of Circuits, Systems and Signal Processing, V3, P103

← 1 2 3 4 →