A Deep Neural Network Approach for Missing-Data Mask Estimation on Dual-Microphone Smartphones: Application to Noise-Robust Speech Recognition

被引:0
作者
Lopez-Espejo, Ivan [1 ]
Gonzalez, Jose A. [2 ]
Gomez, Angel M. [1 ]
Peinado, Antonio M. [1 ]
机构
[1] Univ Granada, Dept Signal Theory Telemat & Commun, E-18071 Granada, Spain
[2] Univ Sheffield, Dept Comp Sci, Sheffield S10 2TN, S Yorkshire, England
来源
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014 | 2014年 / 8854卷
关键词
Dual-microphone; Robust speech recognition; Mask estimation; Smartphone; Deep neural network; Missing data imputation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The inclusion of two or more microphones in smartphones is becoming quite common. These were originally intended to perform noise reduction and few benefit is still being taken from this feature for noise-robust automatic speech recognition (ASR). In this paper we propose a novel system to estimate missing-data masks for robust ASR on dual-microphone smartphones. This novel system is based on deep neural networks (DNNs), which have proven to be a powerful tool in the field of ASR in different ways. To assess the performance of the proposed technique, spectral reconstruction experiments are carried out on a dual-channel database derived from Aurora-2. Our results demonstrate that the DNN is better able to exploit the dual-channel information and yields an improvement on word accuracy of more than 6% over state-of-the-art single-channel mask estimation techniques.
引用
收藏
页码:119 / 128
页数:10
相关论文
共 20 条
  • [1] [Anonymous], 2010, MOMENTUM
  • [2] [Anonymous], 2000, P ANN C INT SPEECH C
  • [3] [Anonymous], ICASSP
  • [4] [Anonymous], 2012, IEEE SIGNAL PROCESSI
  • [5] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    [J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [6] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06): : 1109 - 1121
  • [7] *ETSI ES, 202050 ETSI ES
  • [8] ETSI ES, 201108 ETSI ES
  • [9] MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Ma, Ning
    Gomez, Angel M.
    Barker, Jon
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (03): : 624 - 635
  • [10] Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Gomez, Angel M.
    Carmona, Jose L.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1206 - 1220