MODEL-BASED NOISE REDUCTION LEVERAGING FREQUENCY-WISE CONFIDENCE METRIC FOR IN-CAR SPEECH RECOGNITION

被引:0
|
作者
Ichikawa, Osamu [1 ]
Rennie, Steven J.
Fukuda, Takashi [1 ]
Nishimura, Masafumi [1 ]
机构
[1] IBM Res Tokyo, Yamato 2428502, Japan
来源
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2012年
关键词
Harmonic analysis; speech enhancement; robust speech recognition; model-based noise reduction; missing feature;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Model-based approaches for noise reduction effectively improve the performance of automatic speech recognition in noisy environments. Most of them use the Minimum Mean Square Estimate (MMSE) criterion for de-noised speech estimates. In general, an observation has speech-dominant bands and noise-dominant bands in the Mel spectral domain. This paper introduces a method to add weight to speech-dominated bands when evaluating the posterior probability of each speech state, as these bands are generally more reliable. To leverage high-resolution information in the Mel domain, we use Local Peak Weight (LPW) as the confidence metric for the degree of speech dominance. This information is also used to regulate the amount of compensation that is applied to each frequency band during feature reconstruction under an integrated probabilistic model. The method produced relative word error rate improvements of up to 33.8% over the baseline MMSE method on an isolated word task with car noise.
引用
收藏
页码:4921 / 4924
页数:4
相关论文
共 14 条
  • [1] MODEL-BASED NOISE REDUCTION LEVERAGING FREQUENCY-WISE CONFIDENCE METRIC FOR IN-CAR SPEECH RECOGNITION
    Ichikawa, Osamu
    Rennie, Steven J.
    Fukuda, Takashi
    Nishimura, Masafumi
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4921 - 4924
  • [2] Statistical Model-Based Noise Reduction Approach for Car Interior Applications to Speech Recognition
    Lee, Sung Joo
    Kang, Byung Ok
    Jung, Ho-Young
    Lee, Yunkeun
    Kim, Hyung Soon
    ETRI JOURNAL, 2010, 32 (05) : 801 - 809
  • [3] On the joint use of noise reduction and MLLR adaptation for in-car hands-free speech recognition
    Matassoni, M
    Omologo, M
    Santarelli, A
    Svaizer, P
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 289 - 292
  • [4] Unsupervised noise model estimation for model-based robust speech recognition
    Graciarena, M
    Franco, H
    ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 351 - 356
  • [5] Model-Based Wiener filter for noise robust speech recognition
    Arakawa, Takayuki
    Tsujikawa, Masanori
    Isotani, Ryosuke
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 537 - 540
  • [6] Model-based clustered sparse imputation for noise robust speech recognition
    Goodarzi, Mohammad Mohsen
    Almasganj, Farshad
    SPEECH COMMUNICATION, 2016, 76 : 218 - 229
  • [7] Two-stage noise spectra estimation and regression based in-car speech recognition using single distant microphone
    Li, WF
    Itou, K
    Takeda, K
    Itakura, F
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 533 - 536
  • [8] Noise reduction based on microphone array and post-filtering for robust speech recognition in car environments
    Li, Junfeng
    Akagi, Masato
    ADVANCES FOR IN-VEHICLE AND MOBILE SYSTEMS: CHALLENGES FOR INTERNATIONAL STANDARDS, 2007, : 153 - 166
  • [9] Towards non-stationary model-based noise adaptation for large vocabulary speech recognition
    Kristjansson, T
    Frey, B
    Deng, L
    Acero, A
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 337 - 340
  • [10] Integration of Spatial Cue-Based Noise Reduction and Speech Model-Based Source Restoration for Real Time Speech Enhancement
    Kawase, Tomoko
    Niwa, Kenta
    Fujimoto, Masakiyo
    Kobayashi, Kazunori
    Araki, Shoko
    Nakatani, Tomohiro
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2017, E100A (05) : 1127 - 1136