End-to-End Noisy Speech Recognition Using Fourier and Hilbert Spectrum Features

被引：10

作者：

Vazhenina, Daria ^{[1
]}

Markov, Konstantin ^{[1
]}

机构：

[1] Univ Aizu, Dept Comp Sci & Engn, Fukushima 9658580, Japan

来源：

ELECTRONICS | 2020年 / 9卷 / 07期

关键词：

feature extraction; noisy speech; Hilbert-Huang transform; feature combination; end-to-end models; EMPIRICAL MODE DECOMPOSITION;

D O I：

10.3390/electronics9071157

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite the progress of deep neural networks over the last decade, the state-of-the-art speech recognizers in noisy environment conditions are still far from reaching satisfactory performance. Methods to improve noise robustness usually include adding components to the recognition system that often need optimization. For this reason, data augmentation of the input features derived from the Short-Time Fourier Transform (STFT) has become a popular approach. However, for many speech processing tasks, there is an evidence that the combination of STFT-based and Hilbert-Huang transform (HHT)-based features improves the overall performance. The Hilbert spectrum can be obtained using adaptive mode decomposition (AMD) techniques, which are noise-robust and suitable for non-linear and non-stationary signal analysis. In this study, we developed a DeepSpeech2-based recognition system by adding a combination of STFT and HHT spectrum-based features. We propose several ways to combine those features at different levels of the neural network. All evaluations were performed using the WSJ and CHiME-4 databases. Experimental results show that combining STFT and HHT spectra leads to a 5-7% relative improvement in noisy speech recognition.

引用

页码：1 / 18

页数：18

共 52 条

[1] Amodei D, 2016, PR MACH LEARN RES, V48
[2] [Anonymous], 2016, P 2016 2 INT C COGNI, DOI DOI 10.1109/CCIP.2016.7802858
[3] [Anonymous], 2011, INT J DIGITAL CONTEN
[4] [Anonymous], 1982, Networks
[5] Braun S, 2017, EUR SIGNAL PR CONF, P548, DOI 10.23919/EUSIPCO.2017.8081267
[6] Chorowski J, 2015, ADV NEUR IN, V28
[7] Collins J., 2017, ICLR, P1
[8] Collobert R., ARXIV160903193
[9] FlowNet: Learning Optical Flow with Convolutional Networks
Dosovitskiy, Alexey
Fischer, Philipp
Ilg, Eddy
Haeusser, Philip
Hazirbas, Caner
Golkov, Vladimir
van der Smagt, Patrick
Cremers, Daniel
Brox, Thomas
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2758 - 2766
[10] Variational Mode Decomposition
Dragomiretskiy, Konstantin
Zosso, Dominique
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (03) : 531 - 544

← 1 2 3 4 5 6 →