Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

被引:0
|
作者
Mlynarski, Wiktor [1 ]
机构
[1] Max Planck Inst Math Sci, Leipzig, Germany
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
speech localization; spectrogram; binaural;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.
引用
收藏
页码:2938 / 2941
页数:4
相关论文
共 50 条
  • [21] Transfer Learning of Spectrogram Image for Automatic Sleep Stage Classification
    Gharbali, Ali Abdollahi
    Najdi, Shirin
    Fonseca, Jose Manuel
    IMAGE ANALYSIS AND RECOGNITION (ICIAR 2018), 2018, 10882 : 522 - 528
  • [22] Timing Synchronization Based on Supervised Learning of Spectrogram for OFDM Systems
    Kojima, Shun
    Goto, Yuta
    Maruta, Kazuki
    Sugiura, Shinya
    Ahn, Chang Jun
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2023, 9 (05) : 1141 - 1154
  • [23] Effects of reverberation on sound source localization using binaural spectral cues
    Benton, S
    Spanias, A
    Proceedings of the 23rd IASTED International Conference on Modelling, Identification, and Control, 2004, : 547 - 552
  • [24] A linear phase unwrapping method for binaural sound source localization on a robot
    Li, DF
    Levinson, SE
    2002 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS I-IV, PROCEEDINGS, 2002, : 19 - 23
  • [25] Combining localization cues and source model constraints for binaural source separation
    Weiss, Ron J.
    Mandel, Michael I.
    Ellis, Daniel P. W.
    SPEECH COMMUNICATION, 2011, 53 (05) : 606 - 621
  • [26] Enabling Smart Mobility Features Using Spectrogram Images and Convolutional Neural Networks
    Zhao, Xu Fang
    Tsimhoni, Omer
    2024 IEEE INTERNATIONAL CONFERENCE ON SMART MOBILITY, SM 2024, 2024, : 105 - 109
  • [27] Binaural Multichannel Blind Speaker Separation With a Causal Low-Latency and Low-Complexity Approach
    Westhausen, Nils L.
    Meyer, Bernd T.
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 238 - 247
  • [28] Lung Sounds Classification using Spectrogram's First Order Statistics Features
    Rizal, Achmad
    Hidayat, Risanuri
    Nugroho, Hanung Adi
    2016 6TH INTERNATIONAL ANNUAL ENGINEERING SEMINAR (INAES), 2016, : 96 - 100
  • [29] Achieving stability of ECG biometric features through binaural brain entrainment
    Palaniappan, Ramaswamy
    Andrews, Samraj
    2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 1208 - 1210
  • [30] Modeling the Precedence Effect for Binaural Sound Source Localization in Noisy and Echoic Environments
    Heckmann, Martin
    Rodemann, Tobias
    Schoelling, Bjoern
    Joublin, Frank
    Goerick, Christian
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2598 - 2601