HMM-based mask estimation for a speech recognition front-end using computational auditory scene analysis

被引:0
|
作者
Park, Ji Hun [1 ]
Yoon, Jae Sam [1 ]
Kim, Hong Kook [1 ]
机构
[1] GIST, Dept Informat & Commun, Kwangju 500712, South Korea
来源
2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS | 2008年
关键词
computational auditory scene analysis; mask estimation; hidden Markov model; speech recognition; noise robustness;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 69.14% when compared with the Gaussian kernel-based mask estimation method.
引用
收藏
页码:177 / 180
页数:4
相关论文
共 50 条
  • [31] Independent vector analysis followed by HMM-based feature enhancement for robust speech recognition
    Cho, Ji-Won
    Park, Hyung-Min
    SIGNAL PROCESSING, 2016, 120 : 200 - 208
  • [32] A novel approach to HMM-based speech recognition systems using particle swarm optimization
    Najkar, Negin
    Razzazi, Farbod
    Sameti, Hossein
    MATHEMATICAL AND COMPUTER MODELLING, 2010, 52 (11-12) : 1910 - 1920
  • [33] Feature enhancement for a bitstream-based front-end in wireless speech recognition
    Kim, HK
    Cox, RV
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 241 - 244
  • [34] Investigation into a Mel subspace based front-end processing for robust speech recognition
    Selouani, SA
    O'Shaughnessy, D
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 187 - 190
  • [35] A very low bit rate speech coder using HMM-based speech recognition synthesis techniques
    Tokuda, K
    Masuko, T
    Hiroi, J
    Kobayashi, T
    Kitamura, T
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 609 - 612
  • [36] Robust ASR Based on ETSI Advanced Front-End Using Complex Speech Analysis
    Higa, Keita
    Funaki, Keiichi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2015, E98A (11): : 2211 - 2219
  • [37] Improved monaural speech segregation based on computational auditory scene analysis
    Wang Yu
    Lin Jiajun
    Chen Ning
    Yuan Wenhao
    EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [38] Improved monaural speech segregation based on computational auditory scene analysis
    Wang Yu
    Lin Jiajun
    Chen Ning
    Yuan Wenhao
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [39] Quality Assessment of HMM-Based Speech Synthesis Using Acoustical Vowel Analysis
    Coto-Jimenez, Marvin
    Goddard-Close, John
    Martinez-Licona, Fabiola M.
    SPEECH AND COMPUTER, 2014, 8773 : 368 - 375
  • [40] Using SIMD technology to speed up likelihood computation in HMM-based speech recognition systems
    Ou, Jianlin
    Cai, Jun
    Lin, Qian
    2008 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING, VOLS 1 AND 2, PROCEEDINGS, 2008, : 123 - 127