Learning Binaural Spectrogram Features for Azimuthal Speaker Localization

被引:0
作者
Mlynarski, Wiktor [1 ]
机构
[1] Max Planck Inst Math Sci, Leipzig, Germany
来源
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年
关键词
speech localization; spectrogram; binaural;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.
引用
收藏
页码:2938 / 2941
页数:4
相关论文
共 50 条
  • [31] Genre Classification of Movie Trailers using Spectrogram Analysis and Machine Learning
    Visutsak, Porawat
    Pensiri, Fuangfar
    Netisopakul, Ponrudee
    Punsathit, Navapoom
    Rojsuwan, Pongkorn
    Phuthong, Maywadee
    Khanthawat, Worapol
    Prommanee, Wittawat
    Jearaphan, Kanthong
    Saekua, Tawichai
    Anantaprueksa, Tichaporn
    Kavalee, Napas
    Sukkokee, Naravitch
    Kingket, Nanticha
    Teeravas, Prapassorn
    Nimmonrat, Patchara
    Yungyuen, Patcharada
    Nurltes, Pattareeya
    Wongpanti, Ratchanon
    Hompangwhai, Sekson
    Sutthikawee, Haritchaya
    Siangchin, Apichai
    2024 IEEE INTERNATIONAL BLACK SEA CONFERENCE ON COMMUNICATIONS AND NETWORKING, BLACKSEACOM 2024, 2024, : 324 - 327
  • [32] A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End
    May, Tobias
    van de Par, Steven
    Kohlrausch, Armin
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01): : 1 - 13
  • [33] Binaural Localization Based on Weighted Wiener Gain Improved by Incremental Source Attenuation
    Nagata, Yoshifumi
    Iwasaki, Satoshi
    Hariyama, Takahiko
    Fujioka, Toyota
    Obara, Tomita
    Wakatake, Takayuki
    Abe, Masato
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (01): : 52 - 65
  • [34] A New Deep Learning Framework for HF Signal Detection in Wideband Spectrogram
    Li, Weihao
    Wang, Keren
    You, Ling
    Huang, Zhitao
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1342 - 1346
  • [35] A novel spectrogram based lightweight deep learning for IoT spectrum monitoring
    Benazzouza, Salma
    Ridouani, Mohammed
    Salahdine, Fatima
    Hayar, Aawatif
    PHYSICAL COMMUNICATION, 2024, 64
  • [36] User Identification System Using 2D Resized Spectrogram Features of ECG
    Choi, Gyu-Ho
    Bak, Eun-Sang
    Pan, Sung-Bum
    IEEE ACCESS, 2019, 7 : 34862 - 34873
  • [37] Detection of preceding sleep apnea using ECG spectrogram during CPAP titration night: A novel machine-learning and bag-of-features framework
    Linh, Tran Thanh Duy
    Trang, Nguyen Thi Hoang
    Lin, Shang-Yang
    Wu, Dean
    Liu, Wen-Te
    Hu, Chaur-Jong
    JOURNAL OF SLEEP RESEARCH, 2024, 33 (03)
  • [38] Speaker Recognition Method Based on Statistical Features of Spectrograms and CNN
    Chen, Xi
    Wang, Yonghui
    Wang, Lianming
    Yu, Jieqiong
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [39] Towards human-like production and binaural localization of speech sounds in humanoid robots
    Wolff, Robert
    Lasseck, Mario
    Hild, Manfred
    Vilarroya, Oscar
    Hadzibeganovic, Tank
    2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 1525 - +
  • [40] RECTIFIED BINAURAL RATIO: A COMPLEX T-DISTRIBUTED FEATURE FOR ROBUST SOUND LOCALIZATION
    Deleforge, Antoine
    Forbest, Florence
    2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 1257 - 1261