Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

被引:0
|
作者
Mlynarski, Wiktor [1 ]
机构
[1] Max Planck Inst Math Sci, Leipzig, Germany
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
speech localization; spectrogram; binaural;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.
引用
收藏
页码:2938 / 2941
页数:4
相关论文
共 50 条
  • [1] Speaker identification using spectrogram and learning vector quantization
    Li, Penghua
    Zhang, Shunxing
    Feng, Huizong
    Li, Yuanyuan
    Journal of Computational Information Systems, 2015, 11 (09): : 3087 - 3095
  • [2] Speaker identification based on spectrogram and local binary patterns
    Li, Yuanyuan
    Wang, Yunfang
    Li, Penghua
    Feng, Huizong
    Journal of Computational Information Systems, 2015, 11 (08): : 2771 - 2778
  • [3] Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
    Ajmera, Pawan K.
    Jadhav, Dattatray V.
    Holambe, Raghunath S.
    PATTERN RECOGNITION, 2011, 44 (10-11) : 2749 - 2759
  • [4] SPECTROGRAM BASED FEATURES SELECTION USING MULTIPLE KERNEL LEARNING FOR SPEECH/MUSIC DISCRIMINATION
    Nilufar, Sharmin
    Ray, Nilanjan
    Molla, M. K. Islam
    Hirose, Keikichi
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 501 - 504
  • [5] A deep learning model for depression detection based on MFCC and CNN generated spectrogram features
    Das, Arnab Kumar
    Naskar, Ruchira
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 90
  • [6] Epileptic Seizure Detection and Anticipation using Deep Learning with Ordered Encoding of Spectrogram Features
    Sahu, Sameer Ranjan
    Gorthi, Rama Krishna Sai Subrahmanyam
    Gorthi, Subrahmanyam
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1065 - 1069
  • [7] Spectrogram and Localization of a Sound Source in Shallow Water
    Kuznetsov, G. N.
    Kuz'kin, V. M.
    Pereselkov, S. A.
    ACOUSTICAL PHYSICS, 2017, 63 (04) : 449 - 461
  • [8] Spectrogram and localization of a sound source in shallow water
    G. N. Kuznetsov
    V. M. Kuz’kin
    S. A. Pereselkov
    Acoustical Physics, 2017, 63 : 449 - 461
  • [9] SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
    Nguyen, Thi Ngoc Tho
    Watcharasupat, Karn N.
    Nguyen, Ngoc Khanh
    Jones, Douglas L.
    Gan, Woon-Seng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1749 - 1762
  • [10] Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization
    Yang, Bing
    Liu, Hong
    Li, Xiaofei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3491 - 3503