ROBUST LOCALISATION OF MULTIPLE SPEAKERS EXPLOITING HEAD MOVEMENTS AND MULTI-CONDITIONAL TRAINING OF BINAURAL CUES

被引:0
作者
May, Tobias [1 ]
Ma, Ning [2 ]
Brown, Guy J. [2 ]
机构
[1] Tech Univ Denmark, Ctr Appl Hearing Res, DK-2800 Lyngby, Denmark
[2] Univ Sheffield, Dept Comp Sci, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England
来源
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP) | 2015年
关键词
binaural sound source localisation; head movements; multi-conditional training; generalisation; REVERBERANT ENVIRONMENTS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper addresses the problem of localising multiple competing speakers in the presence of room reverberation, where sound sources can be positioned at any azimuth on the horizontal plane. To reduce the amount of front-back confusions which can occur due to the similarity of interaural time differences (ITDs) and interaural level differences (ILDs) in the front and rear hemifield, a machine hearing system is presented which combines supervised learning of binaural cues using multi-conditional training (MCT) with a head movement strategy. A systematic evaluation showed that this approach substantially reduced the amount of front-back confusions in challenging acoustic scenarios. Moreover, the system was able to generalise to a variety of different acoustic conditions not seen during training.
引用
收藏
页码:2679 / 2683
页数:5
相关论文
共 19 条
[1]  
Blauert J., 1997, Spatial hearing: the psychophysics of human sound localization
[2]   A SPEECH FRAGMENT APPROACH TO LOCALISING MULTIPLE SPEAKERS IN REVERBERANT ENVIRONMENTS [J].
Christensen, Heidi ;
Ma, Ning ;
Wrigley, Stuart N. ;
Barker, Jon .
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :4593-4596
[3]  
Garofolo J. S., 1993, TIMIT ACOUSTIC PHONE, DOI DOI 10.35111/17GK-BN40
[4]   Speech intelligibility and localization in a multi-source environment [J].
Hawley, ML ;
Litovsky, RY ;
Colburn, HS .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (06) :3436-3448
[5]   Dynamic Precedence Effect Modeling for Source Separation in Reverberant Environments [J].
Hummersone, Christopher ;
Mason, Russell ;
Brookes, Tim .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1867-1871
[6]   DISCRETE-TIME TECHNIQUES FOR TIME-DELAY ESTIMATION [J].
JACOVITTI, G ;
SCARANO, G .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1993, 41 (02) :525-533
[7]  
Ma N., 2015, P ICASSP
[8]  
Markovic I, 2013, IEEE INT C INT ROBOT, P2914, DOI 10.1109/IROS.2013.6696769
[9]  
May T., 2012, P IWAENC, P1
[10]  
May T., 2013, TECHNOLOGY BINAURAL, P397