Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

被引：0

作者：

Mlynarski, Wiktor ^{[1
]}

机构：

[1] Max Planck Inst Math Sci, Leipzig, Germany

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speech localization; spectrogram; binaural;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.

引用

页码：2938 / 2941

页数：4

共 50 条

[31] Genre Classification of Movie Trailers using Spectrogram Analysis and Machine Learning
Visutsak, Porawat
Pensiri, Fuangfar
Netisopakul, Ponrudee
Punsathit, Navapoom
Rojsuwan, Pongkorn
Phuthong, Maywadee
Khanthawat, Worapol
Prommanee, Wittawat
Jearaphan, Kanthong
Saekua, Tawichai
Anantaprueksa, Tichaporn
Kavalee, Napas
Sukkokee, Naravitch
Kingket, Nanticha
Teeravas, Prapassorn
Nimmonrat, Patchara
Yungyuen, Patcharada
Nurltes, Pattareeya
Wongpanti, Ratchanon
Hompangwhai, Sekson
Sutthikawee, Haritchaya
Siangchin, Apichai
2024 IEEE INTERNATIONAL BLACK SEA CONFERENCE ON COMMUNICATIONS AND NETWORKING, BLACKSEACOM 2024, 2024, : 324 - 327
[32] A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End
May, Tobias
van de Par, Steven
Kohlrausch, Armin
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 1 - 13
[33] Binaural Localization Based on Weighted Wiener Gain Improved by Incremental Source Attenuation
Nagata, Yoshifumi
Iwasaki, Satoshi
Hariyama, Takahiko
Fujioka, Toyota
Obara, Tomita
Wakatake, Takayuki
Abe, Masato
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (01): : 52 - 65
[34] A New Deep Learning Framework for HF Signal Detection in Wideband Spectrogram
Li, Weihao
Wang, Keren
You, Ling
Huang, Zhitao
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1342 - 1346
[35] A novel spectrogram based lightweight deep learning for IoT spectrum monitoring
Benazzouza, Salma
Ridouani, Mohammed
Salahdine, Fatima
Hayar, Aawatif
PHYSICAL COMMUNICATION, 2024, 64
[36] User Identification System Using 2D Resized Spectrogram Features of ECG
Choi, Gyu-Ho
Bak, Eun-Sang
Pan, Sung-Bum
IEEE ACCESS, 2019, 7 : 34862 - 34873
[37] Detection of preceding sleep apnea using ECG spectrogram during CPAP titration night: A novel machine-learning and bag-of-features framework
Linh, Tran Thanh Duy
Trang, Nguyen Thi Hoang
Lin, Shang-Yang
Wu, Dean
Liu, Wen-Te
Hu, Chaur-Jong
JOURNAL OF SLEEP RESEARCH, 2024, 33 (03)
[38] Speaker Recognition Method Based on Statistical Features of Spectrograms and CNN
Chen, Xi
Wang, Yonghui
Wang, Lianming
Yu, Jieqiong
PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
[39] Towards human-like production and binaural localization of speech sounds in humanoid robots
Wolff, Robert
Lasseck, Mario
Hild, Manfred
Vilarroya, Oscar
Hadzibeganovic, Tank
2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 1525 - +
[40] RECTIFIED BINAURAL RATIO: A COMPLEX T-DISTRIBUTED FEATURE FOR ROBUST SOUND LOCALIZATION
Deleforge, Antoine
Forbest, Florence
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1257 - 1261

← 1 2 3 4 5 →