A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: Application to noise-robust speech recognition

被引：8

作者：

López-Espejo, I. ^{[1
]}

González, José A. ^{[2
]}

Gómez, Ángel M. ^{[1
]}

Peinado, Antonio M. ^{[1
]}

机构：

[1] Dept. of Signal Theory, Telematics and Communications, University of Granada

[2] Dept. of Computer Science, University of Sheffield

来源：

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8854卷

关键词：

Deep neural network; Dual-microphone; Mask estimation; Missing data imputation; Robust speech recognition; Smartphone;

D O I：

10.1007/978-3-319-13623-3_13

中图分类号：

学科分类号：

摘要：

The inclusion of two or more microphones in smartphones is becoming quite common. These were originally intended to perform noise reduction and few benefit is still being taken from this feature for noise-robust automatic speech recognition (ASR). In this paper we propose a novel system to estimate missing-data masks for robust ASR on dual-microphone smartphones. This novel system is based on deep neural networks (DNNs), which have proven to be a powerful tool in the field of ASR in different ways. To assess the performance of the proposed technique, spectral reconstruction experiments are carried out on a dualchannel database derived from Aurora-2. Our results demonstrate that the DNN is better able to exploit the dual-channel information and yields an improvement on word accuracy of more than 6% over state-of-the-art single-channel mask estimation techniques. ©.Springer International Publishing Switzerland 2014.

引用

页码：119 / 128

页数：9

共 9 条

[1] A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition
Lopez-Espejo, Ivan
Gonzalez, Jose A.
Gomez, Angel M.
Peinado, Antonio M.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 119 - 128
[2] Deep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones
Lopez-Espejo, Ivan
Peinado, Antonio M.
Gomez, Angel M.
Martin-Donas, Juan M.
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 117 - 127
[3] FEATURE ENHANCEMENT FOR ROBUST SPEECH RECOGNITION ON SMARTPHONES WITH DUAL-MICROPHONE
Lopez-Espejo, Ivan
Gomez, Angel M.
Gonzalez, Jose A.
Peinado, Antonio M.
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 21 - 25
[4] A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
Li, Bo
Sim, Khe Chai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) : 1296 - 1305
[5] Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition
Liao, Hsien-Cheng
Liao, Yuan-Fu
Lee, Chin-Hui
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 480 - +
[6] EXPLOITING SYNCHRONY SPECTRA AND DEEP NEURAL NETWORKS FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
Ma, Ning
Marxer, Ricard
Barker, Jon
Brown, Guy J.
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 490 - 495
[7] Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network
Hung, Jeih-weih
Lin, Jung-Shan
Wu, Po-Jen
APPLIED SYSTEM INNOVATION, 2018, 1 (03) : 1 - 14
[8] A PITCH BASED NOISE ESTIMATION TECHNIQUE FOR ROBUST SPEECH RECOGNITION WITH MISSING DATA
Morales-Cordovilla, Juan A.
Ma, Ning
Sanchez, Victoria
Carmona, Jose L.
Peinado, Antonio M.
Barker, Jon
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4808 - 4811
[9] Incorporating a Generative Front-end Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
Kundu, Souvik
Sim, Khe Chai
Gales, Mark
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2359 - 2363

← 1 →