Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

被引:0
作者
Abdullah, Salinna [1 ]
Zamani, Majid [1 ,2 ]
Demosthenous, Andreas [1 ]
机构
[1] UCL, Dept Elect & Elect Engn, London WC1E 7JE, England
[2] Univ Southampton, Sch Elect & Comp Sci, Southampton SO17 1BJ, England
来源
IEEE OPEN JOURNAL OF CIRCUITS AND SYSTEMS | 2024年 / 5卷
基金
英国工程与自然科学研究理事会;
关键词
Deep neural network; digital circuits; field programmable gate array (FPGA); mapping; masking; multi-target learning; speech enhancement; structured pruning; ternary quantisation; PROCESSOR;
D O I
10.1109/OJCAS.2024.3389100
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88 similar to mm(2) and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
引用
收藏
页码:141 / 152
页数:12
相关论文
共 33 条
  • [11] Lee J, 2020, ASIAPAC SIGN INFO PR, P739
  • [12] A 2.17-mW Acoustic DSP Processor With CNN-FFT Accelerators for Intelligent Hearing Assistive Devices
    Lee, Yu-Chi
    Chi, Tai-Shih
    Yang, Chia-Hsiang
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2020, 55 (08) : 2247 - 2258
  • [13] A 1.5 mW Programmable Acoustic Signal Processor for Hearing Assistive Devices With Speech Intelligibility Enhancement
    Lin, Yung-Jen
    Lee, Yu-Chi
    Liu, Hao-Min
    Chiueh, Herming
    Chi, Tai-Shih
    Yang, Chia-Hsiang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2020, 67 (12) : 4984 - 4993
  • [14] Long-Term SNR Estimation of Speech Signals in Known and Unknown Channel Conditions
    Papadopoulos, Pavlos
    Tsiartas, Andreas
    Narayanan, Shrikanth
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2495 - 2506
  • [15] Weighted-Entropy-based Quantization for Deep Neural Networks
    Park, Eunhyeok
    Ahn, Junwhan
    Yoo, Sungjoo
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7197 - 7205
  • [16] Improving GANs for Speech Enhancement
    Phan, Huy
    McLoughlin, Ian V.
    Pham, Lam
    Chen, Oliver Y.
    Koch, Philipp
    De Vos, Maarten
    Mertins, Alfred
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 1700 - 1704
  • [17] A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks
    Price, Michael
    Glass, James
    Chandrakasan, Anantha P.
    [J]. IEEE JOURNAL OF SOLID-STATE CIRCUITS, 2018, 53 (01) : 66 - 75
  • [18] Rix AW, 2001, INT CONF ACOUST SPEE, P749, DOI 10.1109/ICASSP.2001.941023
  • [19] A compact digital gamma-tone filter processor
    Rojo-Hernandez, Areli
    Sanchez-Rivera, Giovanny
    Avalos-Ochoa, Gerardo
    Perez-Meana, Hector
    Smith, Leslie S.
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2016, 45 : 216 - 225
  • [20] Saeed A., 2009, International Journal of Circuits, Systems and Signal Processing, V3, P103