Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

被引：0

作者：

Mlynarski, Wiktor ^{[1
]}

机构：

[1] Max Planck Inst Math Sci, Leipzig, Germany

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speech localization; spectrogram; binaural;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.

引用

页码：2938 / 2941

页数：4

共 50 条

[11] Speaker Identification Using FrFT-based Spectrogram and RBF Neural Network
Li, Penghua
Li, Yuanyuan
Luo, Dechao
Luo, Hongping
2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 3674 - 3679
[12] The Scaled Reassigned Spectrogram with Perfect Localization for Estimation of Gaussian Functions
Hansson-Sandsten, Maria
Brynolfsson, Johan
IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (01) : 100 - 104
[13] Experimental Analysis and Selection of Spectrogram Features for Speech Emotion Recognition
Tang, Gui-Chen
Liang, Rui-Yu
Feng, Yue-Qin
Wang, Qing-Yun
INTERNATIONAL CONFERENCE ON MECHANICS, BUILDING MATERIAL AND CIVIL ENGINEERING (MBMCE 2015), 2015, : 757 - 762
[14] A human fatigue detection method based on speech spectrogram features
Li X.
Li G.
Deng M.
Wan P.
Yan L.
Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2021, 42 (02): : 123 - 132
[15] The impact of MFCC, spectrogram, and Mel-Spectrogram on deep learning models for Amazigh speech recognition system
Meryam Telmem
Naouar Laaidi
Hassan Satori
International Journal of Speech Technology, 2025, 28 (1) : 299 - 312
[16] ROBUST FULL-SPHERE BINAURAL SOUND SOURCE LOCALIZATION
Hammond, Benjamin R.
Jackson, Philip J. B.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 86 - 90
[17] Heliumspeech Unscrambling Method Based on Spectrogram Lexicon Learning
Zhu, Heng
Zhou, Jinghan
Zhang, Shibing
2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 157 - 161
[18] Recognition of Sheep Feeding Behavior in Sheepfolds Using Fusion Spectrogram Depth Features and Acoustic Features
Yu, Youxin
Zhu, Wenbo
Ma, Xiaoli
Du, Jialei
Liu, Yu
Gan, Linhui
An, Xiaoping
Li, Honghui
Wang, Buyu
Fu, Xueliang
ANIMALS, 2024, 14 (22):
[19] Classification and Recognition of Laying Hens' Vocalization Based on Texture Features of Spectrogram
Du X.
Teng G.
Tomas N.
Wang C.
Liu M.
Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (09): : 215 - 220
[20] Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition
Zhang, Linjuan
Wang, Longbiao
Dang, Jianwu
Guo, Lili
Guan, Haotian
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT IV, 2018, 11304 : 62 - 71

← 1 2 3 4 5 →