Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones

被引：21

作者：

Wang, Lin ^{[1
]}

Cavallaro, Andrea ^{[1
]}

机构：

[1] Queen Mary Univ London, Ctr Intelligent Sensing, London E1 4NS, England

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2021年 / 5卷 / 06期

基金：

英国工程与自然科学研究理事会; “创新英国”项目;

关键词：

Deep learning; Deep neural network (DNN); drone; ego-noise reduction; microphone array; SINGLE;

D O I：

10.1109/TETCI.2020.3014934

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than -15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering.

引用

页码：871 / 881

页数：11

共 50 条

[1] Joint Time-Frequency and Time Domain Learning for Speech Enhancement
Tang, Chuanxin
Luo, Chong
Zhao, Zhiyuan
Xie, Wenxuan
Zeng, Wenjun
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3816 - 3822
[2] Single channel speech enhancement via time-frequency dictionary learning
Huang, Jianjun
Zhang, Xiongwei
Zhang, Yafei
Zou, Xia
Shengxue Xuebao/Acta Acustica, 2012, 37 (05): : 539 - 547
[3] Single channel speech enhancement via time-frequency dictionary learning
HUANG Jianjun
ZHANG Xiongwei
ZHANG Yafei
ZOU Xia
Chinese Journal of Acoustics, 2013, 32 (01) : 90 - 102
[4] TIME-FREQUENCY ATTENTION FOR MONAURAL SPEECH ENHANCEMENT
Zhang, Qiquan
Song, Qi
Ni, Zhaoheng
Nicolson, Aaron
Li, Haizhou
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7852 - 7856
[5] Neural speech enhancement in the time-frequency domain
Volkmer, M
2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 617 - 626
[6] Image processing for time-frequency speech analysis
Benyoucef M.
International Journal of Speech Technology, 2008, 11 (1) : 43 - 49
[7] Deep Speech Inpainting of Time-frequency Masks
Kegler, Mikolaj
Beckmann, Pierre
Cernak, Milos
INTERSPEECH 2020, 2020, : 3276 - 3280
[8] A Time-Frequency Attention Module for Neural Speech Enhancement
Zhang, Qiquan
Qian, Xinyuan
Ni, Zhaoheng
Nicolson, Aaron
Ambikairajah, Eliathamby
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 462 - 475
[9] Integrated speech enhancement and coding in the time-frequency domain
Drygajlo, A
Carnero, B
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1183 - 1186
[10] Adaptive time-frequency data fusion for speech enhancement
Shi, G
Aarabi, P
Lazic, N
FUSION 2003: PROCEEDINGS OF THE SIXTH INTERNATIONAL CONFERENCE OF INFORMATION FUSION, VOLS 1 AND 2, 2003, : 394 - 399

← 1 2 3 4 5 →