A deep neural network approach for missing-data mask estimation on dual-microphone smartphones: Application to noise-robust speech recognition

被引:8
作者
López-Espejo, I. [1 ]
González, José A. [2 ]
Gómez, Ángel M. [1 ]
Peinado, Antonio M. [1 ]
机构
[1] Dept. of Signal Theory, Telematics and Communications, University of Granada
[2] Dept. of Computer Science, University of Sheffield
来源
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | 2014年 / 8854卷
关键词
Deep neural network; Dual-microphone; Mask estimation; Missing data imputation; Robust speech recognition; Smartphone;
D O I
10.1007/978-3-319-13623-3_13
中图分类号
学科分类号
摘要
The inclusion of two or more microphones in smartphones is becoming quite common. These were originally intended to perform noise reduction and few benefit is still being taken from this feature for noise-robust automatic speech recognition (ASR). In this paper we propose a novel system to estimate missing-data masks for robust ASR on dual-microphone smartphones. This novel system is based on deep neural networks (DNNs), which have proven to be a powerful tool in the field of ASR in different ways. To assess the performance of the proposed technique, spectral reconstruction experiments are carried out on a dualchannel database derived from Aurora-2. Our results demonstrate that the DNN is better able to exploit the dual-channel information and yields an improvement on word accuracy of more than 6% over state-of-the-art single-channel mask estimation techniques. ©.Springer International Publishing Switzerland 2014.
引用
收藏
页码:119 / 128
页数:9
相关论文
共 9 条
  • [1] A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition
    Lopez-Espejo, Ivan
    Gonzalez, Jose A.
    Gomez, Angel M.
    Peinado, Antonio M.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 119 - 128
  • [2] Deep Neural Network-Based Noise Estimation for Robust ASR in Dual-Microphone Smartphones
    Lopez-Espejo, Ivan
    Peinado, Antonio M.
    Gomez, Angel M.
    Martin-Donas, Juan M.
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2016, 2016, 10077 : 117 - 127
  • [3] FEATURE ENHANCEMENT FOR ROBUST SPEECH RECOGNITION ON SMARTPHONES WITH DUAL-MICROPHONE
    Lopez-Espejo, Ivan
    Gomez, Angel M.
    Gonzalez, Jose A.
    Peinado, Antonio M.
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 21 - 25
  • [4] A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
    Li, Bo
    Sim, Khe Chai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (08) : 1296 - 1305
  • [5] Maximum Confidence Measure Based Interaural Phase Difference Estimation for Noise Masking in Dual-Microphone Robust Speech Recognition
    Liao, Hsien-Cheng
    Liao, Yuan-Fu
    Lee, Chin-Hui
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 480 - +
  • [6] EXPLOITING SYNCHRONY SPECTRA AND DEEP NEURAL NETWORKS FOR NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION
    Ma, Ning
    Marxer, Ricard
    Barker, Jon
    Brown, Guy J.
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 490 - 495
  • [7] Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network
    Hung, Jeih-weih
    Lin, Jung-Shan
    Wu, Po-Jen
    APPLIED SYSTEM INNOVATION, 2018, 1 (03) : 1 - 14
  • [8] A PITCH BASED NOISE ESTIMATION TECHNIQUE FOR ROBUST SPEECH RECOGNITION WITH MISSING DATA
    Morales-Cordovilla, Juan A.
    Ma, Ning
    Sanchez, Victoria
    Carmona, Jose L.
    Peinado, Antonio M.
    Barker, Jon
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4808 - 4811
  • [9] Incorporating a Generative Front-end Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
    Kundu, Souvik
    Sim, Khe Chai
    Gales, Mark
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2359 - 2363