AUDIO INPUTS FOR ACTIVE SPEAKER DETECTION AND LOCALIZATION VIA MICROPHONE ARRAY

被引:2
作者
Berghi, Davide [1 ]
Jackson, Philip J. B. [1 ]
机构
[1] Univ Surrey, Ctr Vis Speech & Signal Proc, Guildford, Surrey, England
来源
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA | 2023年
基金
英国工程与自然科学研究理事会;
关键词
Features extraction; active speaker detection and localization; microphone array; multichannel audio; SOUND EVENT LOCALIZATION;
D O I
10.1109/WASPAA58266.2023.10248185
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study considers the problem of detecting and locating an active talker's horizontal position from multichannel audio captured by a microphone array. We refer to this as active speaker detection and localization (ASDL). Our goal was to investigate the performance of spatial acoustic features extracted from the multichannel audio as the input of a convolutional recurrent neural network (CRNN), in relation to the number of channels employed and additive noise. To this end, experiments were conducted to compare the generalized cross-correlation with phase transform (GCC-PHAT), the spatial cue-augmented log-spectrogram (SALSA) features, and a recently-proposed beamforming method, evaluating their robustness to various noise intensities. The array aperture and sampling density were tested by taking subsets from the 16-microphone array. Results and tests of statistical significance demonstrate the microphones' contribution to performance on the TragicTalkers dataset, which offers opportunities to investigate audio-visual approaches in the future.
引用
收藏
页数:5
相关论文
共 26 条
[1]   Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks [J].
Adavanne, Sharath ;
Politis, Archontis ;
Nikunen, Joonas ;
Virtanen, Tuomas .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) :34-48
[2]  
Alcázar JL, 2020, PROC CVPR IEEE, P12462, DOI 10.1109/CVPR42600.2020.01248
[3]  
Berghi D., 2022, EUR C VIS MED PROD
[4]   Visually Supervised Speaker Detection and Localization via Microphone Array [J].
Berghi, Davide ;
Hilton, Adrian ;
Jackson, Philip J. B. .
IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,
[5]  
Cao Y., 2020, DETECTION CLASSIFICA
[6]  
Cao Y., 2019, DETECTION CLASSIFICA
[7]  
Chen C., 2021, IEEE CVF C COMP VIS, p15 511
[8]  
Chen Changan, 2021, INT C LEARN REPR
[9]   The PASCAL Visual Object Classes Challenge: A Retrospective [J].
Everingham, Mark ;
Eslami, S. M. Ali ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) :98-136
[10]   Microphone Array Geometries for Horizontal Spatial Audio Object Capture With Beamforming [J].
Galindo, Miguel Blanco ;
Coleman, Philip ;
Jackson, Philip J. B. .
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2020, 68 (05) :324-337