Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

被引：0

作者：

Mlynarski, Wiktor ^{[1
]}

机构：

[1] Max Planck Inst Math Sci, Leipzig, Germany

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speech localization; spectrogram; binaural;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.

引用

页码：2938 / 2941

页数：4

共 50 条

[21] Transfer Learning of Spectrogram Image for Automatic Sleep Stage Classification
Gharbali, Ali Abdollahi
Najdi, Shirin
Fonseca, Jose Manuel
IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 522 - 528
[22] Timing Synchronization Based on Supervised Learning of Spectrogram for OFDM Systems
Kojima, Shun
Goto, Yuta
Maruta, Kazuki
Sugiura, Shinya
Ahn, Chang Jun
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2023, 9 (05) : 1141 - 1154
[23] Effects of reverberation on sound source localization using binaural spectral cues
Benton, S
Spanias, A
Proceedings of the 23rd IASTED International Conference on Modelling, Identification, and Control, 2004, : 547 - 552
[24] A linear phase unwrapping method for binaural sound source localization on a robot
Li, DF
Levinson, SE
2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 19 - 23
[25] Combining localization cues and source model constraints for binaural source separation
Weiss, Ron J.
Mandel, Michael I.
Ellis, Daniel P. W.
SPEECH COMMUNICATION, 2011, 53 (05) : 606 - 621
[26] Enabling Smart Mobility Features Using Spectrogram Images and Convolutional Neural Networks
Zhao, Xu Fang
Tsimhoni, Omer
2024 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM 2024, 2024, : 105 - 109
[27] Binaural Multichannel Blind Speaker Separation With a Causal Low-Latency and Low-Complexity Approach
Westhausen, Nils L.
Meyer, Bernd T.
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 238 - 247
[28] Lung Sounds Classification using Spectrogram's First Order Statistics Features
Rizal, Achmad
Hidayat, Risanuri
Nugroho, Hanung Adi
2016 6TH INTERNATIONAL ANNUAL ENGINEERING SEMINAR (INAES), 2016, : 96 - 100
[29] Achieving stability of ECG biometric features through binaural brain entrainment
Palaniappan, Ramaswamy
Andrews, Samraj
2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1208 - 1210
[30] Modeling the Precedence Effect for Binaural Sound Source Localization in Noisy and Echoic Environments
Heckmann, Martin
Rodemann, Tobias
Schoelling, Bjoern
Joublin, Frank
Goerick, Christian
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2598 - 2601

← 1 2 3 4 5 →