HMM-based mask estimation for a speech recognition front-end using computational auditory scene analysis

被引:0
|
作者
Park, Ji Hun [1 ]
Yoon, Jae Sam [1 ]
Kim, Hong Kook [1 ]
机构
[1] GIST, Dept Informat & Commun, Kwangju 500712, South Korea
来源
2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS | 2008年
关键词
computational auditory scene analysis; mask estimation; hidden Markov model; speech recognition; noise robustness;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a new mask estimation method for the computational auditory scene analysis (CASA) of speech using two microphones. The proposed method is based on a hidden Markov model (HMM) in order to incorporate an observation that the mask information should be correlated over contiguous analysis frames. In other words, HMM is used to estimate the mask information represented as the interaural time difference (ITD) and the interaural level difference (ILD) of two channel signals, and the estimated mask information is finally employed in the separation of desired speech from noisy speech. To show the effectiveness of the proposed mask estimation, we then compare the performance of the proposed method with that of a Gaussian kernel-based estimation method in terms of the performance of speech recognition. As a result, the proposed HMM-based mask estimation method provided an average word error rate reduction of 69.14% when compared with the Gaussian kernel-based mask estimation method.
引用
收藏
页码:177 / 180
页数:4
相关论文
共 50 条
  • [41] Bayesian Context Clustering Using Cross Valid Prior Distribution for HMM-Based Speech Recognition
    Hashimoto, Kei
    Zen, Heiga
    Nankaku, Yoshihiko
    Lee, Akinobu
    Tokuda, Keiichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 936 - 939
  • [42] Auditory motivated front-end for noisy speech using spectro-temporal modulation filtering
    Ganapathy, Sriram
    Omar, Mohamed
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (05): : EL343 - EL349
  • [43] Efficient Noise-Robust Speech Recognition Front-End Based on the ETSI Standard
    Neves, Claudio
    Veiga, Arlindo
    Sa, Luis
    Perdigao, Fernando
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 609 - 612
  • [44] A Speech Enhancement Algorithm Using Computational Auditory Scene Analysis with Spectral Subtraction
    Guo, Cong
    Hui, Like
    Zhang, Wei-Qiang
    Liu, Jia
    2016 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2016, : 6 - 10
  • [45] Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech
    Li, Peng
    Guan, Yong
    Xu, Bo
    Liu, Wenju
    ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 742 - +
  • [46] HMM-BASED SPEECH SYNTHESIS ADAPTATION USING NOISY DATA: ANALYSIS AND EVALUATION METHODS
    Karhila, Reima
    Remes, Ulpu
    Kurimo, Mikko
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6930 - 6934
  • [47] Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech
    Li, Peng
    Guan, Yong
    Xu, Bo
    Liu, Wenju
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (06): : 2014 - 2023
  • [48] A noise robust front-end for speech recognition using hough transform and cumulative distribution mapping
    Choi, Eric H. C.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 286 - +
  • [49] Robust automatic speech recognition using a multi-channel signal separation front-end
    Yen, KC
    Zhao, YX
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1337 - 1340
  • [50] An evolutionary decoding method for HMM-based continuous speech recognition systems using particle swarm optimization
    Najkar, Negin
    Razzazi, Farbod
    Sameti, Hossein
    PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (02) : 327 - 339