Mask estimation for missing data speech recognition based on statistics of binaural interaction

被引:38
作者
Harding, S [1 ]
Barker, J [1 ]
Brown, GJ [1 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 01期
关键词
automatic speech recognition; binaural; computational auditory scene analysis (CASA); interaural level differences (ILD); interaural time differences (ITD); missing data; reverberation;
D O I
10.1109/TSA.2005.860354
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.
引用
收藏
页码:58 / 67
页数:10
相关论文
共 26 条
  • [1] Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.1121/1.408434, DOI 10.7551/MITPRESS/1486.001.0001]
  • [2] [Anonymous], 1993, PROC HLT HLT 93, DOI DOI 10.3115/1075671.1075688
  • [3] [Anonymous], 2000, ICSLP 2000
  • [4] Bodden M., 1993, Acta Acustica, V1, P43
  • [5] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    [J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [6] Auditory objects of attention: The role of interaural time differences
    Darwin, CJ
    Hukin, RW
    [J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1999, 25 (03) : 617 - 629
  • [8] Robust continuous speech recognition using parallel model combination
    Gales, MJF
    Young, SJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (05): : 352 - 359
  • [9] Gillespie BW, 2002, INT CONF ACOUST SPEE, P557
  • [10] DERIVATION OF AUDITORY FILTER SHAPES FROM NOTCHED-NOISE DATA
    GLASBERG, BR
    MOORE, BCJ
    [J]. HEARING RESEARCH, 1990, 47 (1-2) : 103 - 138