Mask estimation for missing data speech recognition based on statistics of binaural interaction

被引：38

作者：

Harding, S ^{[1
]}

Barker, J ^{[1
]}

Brown, GJ ^{[1
]}

机构：

[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2006年 / 14卷 / 01期

关键词：

automatic speech recognition; binaural; computational auditory scene analysis (CASA); interaural level differences (ILD); interaural time differences (ITD); missing data; reverberation;

D O I：

10.1109/TSA.2005.860354

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes a perceptually motivated computational auditory scene analysis (CASA) system that combines sound separation according to spatial location with the "missing data" approach for robust speech recognition in noise. Missing data time-frequency masks are created using probability distributions based on estimates of interaural time and level differences (ITD and ILD) for mixed utterances in reverberated conditions; these masks indicate which regions of the spectrum constitute reliable evidence of the target speech signal. A number of experiments compare the relative efficacy of the binaural cues when used individually and in combination. We also investigate the ability of the system to generalize to acoustic conditions not encountered during training. Performance on a continuous digit recognition task using this method is found to be good, even in a particularly challenging environment with three concurrent male talkers.

引用

页码：58 / 67

页数：10

共 26 条

[1] Albert S. Bregman, 1990, AUDITORY SCENE ANAL, P411, DOI [DOI 10.1121/1.408434, DOI 10.7551/MITPRESS/1486.001.0001]
[2] [Anonymous], 1993, PROC HLT HLT 93, DOI DOI 10.3115/1075671.1075688
[3] [Anonymous], 2000, ICSLP 2000
[4] Bodden M., 1993, Acta Acustica, V1, P43
[5] Robust automatic speech recognition with missing and unreliable acoustic data
Cooke, M
Green, P
Josifovski, L
Vizinho, A
[J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
[6] Auditory objects of attention: The role of interaural time differences
Darwin, CJ
Hukin, RW
[J]. JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1999, 25 (03) : 617 - 629
[7] EQUALIZATION AND CANCELLATION THEORY OF BINAURAL MASKING-LEVEL DIFFERENCES
DURLACH, NI
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1963, 35 (08) : 1206 - &
[8] Robust continuous speech recognition using parallel model combination
Gales, MJF
Young, SJ
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (05): : 352 - 359
[9] Gillespie BW, 2002, INT CONF ACOUST SPEE, P557
[10] DERIVATION OF AUDITORY FILTER SHAPES FROM NOTCHED-NOISE DATA
GLASBERG, BR
MOORE, BCJ
[J]. HEARING RESEARCH, 1990, 47 (1-2) : 103 - 138

← 1 2 3 →