Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

被引:0
|
作者
Mlynarski, Wiktor [1 ]
机构
[1] Max Planck Inst Math Sci, Leipzig, Germany
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
speech localization; spectrogram; binaural;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.
引用
收藏
页码:2938 / 2941
页数:4
相关论文
共 50 条
  • [11] Speaker Identification Using FrFT-based Spectrogram and RBF Neural Network
    Li, Penghua
    Li, Yuanyuan
    Luo, Dechao
    Luo, Hongping
    2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3674 - 3679
  • [12] The Scaled Reassigned Spectrogram with Perfect Localization for Estimation of Gaussian Functions
    Hansson-Sandsten, Maria
    Brynolfsson, Johan
    IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (01) : 100 - 104
  • [13] Experimental Analysis and Selection of Spectrogram Features for Speech Emotion Recognition
    Tang, Gui-Chen
    Liang, Rui-Yu
    Feng, Yue-Qin
    Wang, Qing-Yun
    INTERNATIONAL CONFERENCE ON MECHANICS, BUILDING MATERIAL AND CIVIL ENGINEERING (MBMCE 2015), 2015, : 757 - 762
  • [14] A human fatigue detection method based on speech spectrogram features
    Li X.
    Li G.
    Deng M.
    Wan P.
    Yan L.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2021, 42 (02): : 123 - 132
  • [15] The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system
    Meryam Telmem
    Naouar Laaidi
    Hassan Satori
    International Journal of Speech Technology, 2025, 28 (1) : 299 - 312
  • [16] ROBUST FULL-SPHERE BINAURAL SOUND SOURCE LOCALIZATION
    Hammond, Benjamin R.
    Jackson, Philip J. B.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 86 - 90
  • [17] Heliumspeech Unscrambling Method Based on Spectrogram Lexicon Learning
    Zhu, Heng
    Zhou, Jinghan
    Zhang, Shibing
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 157 - 161
  • [18] Recognition of Sheep Feeding Behavior in Sheepfolds Using Fusion Spectrogram Depth Features and Acoustic Features
    Yu, Youxin
    Zhu, Wenbo
    Ma, Xiaoli
    Du, Jialei
    Liu, Yu
    Gan, Linhui
    An, Xiaoping
    Li, Honghui
    Wang, Buyu
    Fu, Xueliang
    ANIMALS, 2024, 14 (22):
  • [19] Classification and Recognition of Laying Hens' Vocalization Based on Texture Features of Spectrogram
    Du X.
    Teng G.
    Tomas N.
    Wang C.
    Liu M.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (09): : 215 - 220
  • [20] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
    Zhang, Linjuan
    Wang, Longbiao
    Dang, Jianwu
    Guo, Lili
    Guan, Haotian
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71