SNR-Based Mask Compensation for Computational Auditory Scene Analysis Applied to Speech Recognition in a Car Environment

被引:0
|
作者
Park, Ji Hun [1 ]
Kim, Seon Man [1 ]
Yoon, Jae Sam [1 ]
Kim, Hong Kook [1 ]
Lee, Sung Joo [2 ]
Lee, Yunkeun [2 ]
机构
[1] Gwangju Inst Sci & Technol, Sch Informat & Commun, Kwangju 500712, South Korea
[2] Elect & Telecommun Res Inst, Speech Proc Team, Speech & Language Informat Res Div, Daejeon 305350, South Korea
来源
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年
关键词
Speech recognition; speech separation; computational auditory scene analysis; mask compensation; beamforming;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a computational auditory scene analysis (CASA)-based front-end for two microphone speech recognition in a car environment. One of the important issues associated with CASA is the accurate estimation of mask information for target speech separation within multiple microphone noisy speech. For such a task, the time frequency mask information is compensated through the signal to noise ratio resulted from a beamformer to adjust the noise quantity included in noisy speech. We evaluate the performance of an automatic speech recognition (ASR) system employing a CASA-based front-end with the proposed mask compensation method. In addition, we compare its performance with those employing a CASA-based front-end without mask compensation and the beamforming based front-end. As a result, the CASA-based front-end achieves an average word error rate (WER) reduction of 8.57% when the proposed mask compensation method is applied. In addition, the CASA-based front-end with the proposed method provides a relative WER reduction of 26.52%, compared with the beamforming-based front-end.
引用
收藏
页码:725 / +
页数:2
相关论文
共 50 条
  • [1] HMM-based mask estimation for a speech recognition front-end using computational auditory scene analysis
    Park, Ji Hun
    Yoon, Jae Sam
    Kim, Hong Kook
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 177 - 180
  • [2] HMM-Based mask estimation for a speech recognition front-end using computational auditory scene analysis
    Park, Ji Hun
    Yoon, Jae Sam
    Kim, Hong Kook
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (09) : 2360 - 2364
  • [3] A Computational Auditory Scene Analysis System for Robust Speech Recognition
    Srinivasan, Soundararajan
    Shao, Yang
    Jin, Zhaozhang
    Wang, DeLiang
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 73 - +
  • [4] Linking computational auditory scene analysis to automatic speech recognition
    Cooke, M
    Morris, A
    Green, P
    ACUSTICA, 1996, 82 : S87 - S87
  • [5] A computational auditory scene analysis system for speech segregation and robust speech recognition
    Shao, Yang
    Srinivasan, Soundararajan
    Jin, Zhaozhang
    Wang, DeLiang
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 77 - 93
  • [6] A series of SNR-based speech intelligibility models in the Auditory Modeling Toolbox
    Lavandier, Mathieu
    Vicente, Thibault
    Prud'homme, Luna
    ACTA ACUSTICA, 2022, 6
  • [7] Separation of Reverberant Speech Based on Computational Auditory Scene Analysis
    Li Hongyan
    Cao Meng
    Wang Yue
    AUTOMATIC CONTROL AND COMPUTER SCIENCES, 2018, 52 (06) : 561 - 571
  • [8] Robust front-end for speech recognition based on computational auditory scene analysis and speaker model
    Guan, Yong
    Li, Peng
    Liu, Wen-Ju
    Xu, Bo
    Zidonghua Xuebao/ Acta Automatica Sinica, 2009, 35 (04): : 410 - 416
  • [9] Improved monaural speech segregation based on computational auditory scene analysis
    Wang Yu
    Lin Jiajun
    Chen Ning
    Yuan Wenhao
    EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [10] Improved monaural speech segregation based on computational auditory scene analysis
    Wang Yu
    Lin Jiajun
    Chen Ning
    Yuan Wenhao
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,