Subband Weighting for Binaural Speech Source Localization

被引：2

作者：

Karthik, Girija Ramesan ^{[1
]}

Suresh, Parth ^{[2
]}

Ghosh, Prasanta Kumar ^{[1
]}

机构：

[1] Indian Inst Sci IISc, Elect Engn, Bengaluru 560012, India

[2] TKM Coll Engn, Comp Sci & Engn, Kollam 691005, India

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

gammatone filters; interaural time difference; warping function; SOUND SOURCE LOCALIZATION; PROBABILISTIC MODEL; SELECTION;

D O I：

10.21437/Interspeech.2018-2173

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the task of speech source localization from a binaural recording using interaural time difference (ITD). A typical approach is to process binaural speech using gammatone filters and calculate frame-level ITD in each subband. The ITDs in each gammatone subband are statistically modelled using Gaussian mixture models (GMMs) for every direction during training. Given a binaural test-speech, the source is localized using maximum likelihood (ML) criterion. In this work, we propose a subband weighting scheme where subband likelihoods are weighted based on their reliability. We measure the reliability of a subband using the average frame level localization error obtained for the respective subbands. These reliability values are used as the weights for each subband likelihood prior to combining the likelihoods for ML estimation. We also introduce non-linear warping of these weights to accommodate and analyse a larger space of possible subband weights. Experiments on Subject_003 from the CIPIC database reveal that weighting the subbands is better than the unweighted scheme of combining likelihoods.

引用

页码：861 / 865

页数：5

共 29 条

[1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
AKAIKE, H
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
[2] Algazi V. R., 2001, IEEE WORKSH APPL SIG, P99, DOI DOI 10.1109/ASPAA.2001.969552
[3] [Anonymous], 2012, P IEEE INT WORKSH MA
[4] A survey on sound source localization in robotics: From binaural to array processing methods
Argentieri, S.
Danes, P.
Soueres, P.
[J]. COMPUTER SPEECH AND LANGUAGE, 2015, 34 (01) : 87 - 112
[5] Adaptive eigenvalue decomposition algorithm for passive acoustic source localization
Benesty, J
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2000, 107 (01) : 384 - 391
[6] A structural model for binaural sound synthesis
Brown, CP
Duda, RO
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (05): : 476 - 488
[7] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[8] Do H, 2007, INT CONF ACOUST SPEE, P121
[9] Source localization in complex listening situations: Selection of binaural cues based on interaural coherence
Faller, C
Merimaa, J
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (05) : 3075 - 3089
[10] Garofalo J. S., 1993, DARPA TIMIT ACOUSTIC, V93

← 1 2 3 →