INVESTIGATING SELF-SUPERVISED DEEP REPRESENTATIONS FOR EEG-BASED AUDITORY ATTENTION DECODING

被引:1
作者
Thakkar, Karan [1 ]
Hai, Jiarui [1 ]
Elhilali, Mounya [1 ]
机构
[1] Johns Hopkins Univ, Lab Computat Audio Percept, Baltimore, MD 21218 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
auditory attention decoding; electroencephalogram (EEG); self-supervised speech representations; SPEECH; ENVIRONMENT;
D O I
10.1109/ICASSP48485.2024.10448271
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Auditory Attention Decoding (AAD) algorithms play a crucial role in isolating desired sound sources within challenging acoustic environments directly from brain activity. Although recent research has shown promise in AAD using shallow representations such as auditory envelope and spectrogram, there has been limited exploration of deep Self-Supervised (SS) representations on a larger scale. In this study, we undertake a comprehensive investigation into the performance of linear decoders across 12 deep and 2 shallow representations, applied to EEG data from multiple studies spanning 57 subjects and multiple languages. Our experimental results consistently reveal the superiority of deep features for AAD at decoding background speakers, regardless of the datasets and analysis windows. This result indicates possible nonlinear encoding of unattended signals in the brain that are revealed using deep nonlinear features. Additionally, we analyze the impact of different layers of SS representations and window sizes on AAD performance. These findings underscore the potential for enhancing EEG-based AAD systems through the integration of deep feature representations.
引用
收藏
页码:1241 / 1245
页数:5
相关论文
共 24 条
[1]   Decoding of the speech envelope from EEG using the VLAAI deep neural network [J].
Accou, Bernd ;
Vanthornhout, Jonas ;
Van Hamme, Hugo ;
Francart, Tom .
SCIENTIFIC REPORTS, 2023, 13 (01)
[2]   Robust decoding of selective auditory attention from MEG in a competing-speaker environment via state-space modeling [J].
Akram, Sahar ;
Presacco, Alessandro ;
Simon, Jonathan Z. ;
Shamma, Shihab A. ;
Babadi, Behtash .
NEUROIMAGE, 2016, 124 :906-917
[3]  
Baevski A, 2020, ADV NEUR IN, V33
[4]   Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario [J].
Biesmans, Wouter ;
Das, Neetha ;
Francart, Tom ;
Bertrand, Alexander .
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2017, 25 (05) :402-412
[5]   WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [J].
Chen, Sanyuan ;
Wang, Chengyi ;
Chen, Zhengyang ;
Wu, Yu ;
Liu, Shujie ;
Chen, Zhuo ;
Li, Jinyu ;
Kanda, Naoyuki ;
Yoshioka, Takuya ;
Xiao, Xiong ;
Wu, Jian ;
Zhou, Long ;
Ren, Shuo ;
Qian, Yanmin ;
Qian, Yao ;
Zeng, Michael ;
Yu, Xiangzhan ;
Wei, Furu .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) :1505-1518
[6]   The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli [J].
Crosse, Michael J. ;
Di Liberto, Giovanni M. ;
Bednar, Adam ;
Lalor, Edmund C. .
FRONTIERS IN HUMAN NEUROSCIENCE, 2016, 10
[7]  
Etard Octave, 2022, EEG DATASET DECODING
[8]  
Fuglsang S A., 2018, EEG and audio dataset for auditory attention decoding
[9]   Noise-robust cortical tracking of attended speech in real-world acoustic scenes [J].
Fuglsang, Soren Asp ;
Dau, Torsten ;
Hjortkjaer, Jens .
NEUROIMAGE, 2017, 156 :435-444
[10]   MEG and EEG data analysis with MNE-Python']Python [J].
Gramfort, Alexandre ;
Luessi, Martin ;
Larson, Eric ;
Engemann, Denis A. ;
Strohmeier, Daniel ;
Brodbeck, Christian ;
Goj, Roman ;
Jas, Mainak ;
Brooks, Teon ;
Parkkonen, Lauri ;
Haemaelaeinen, Matti .
FRONTIERS IN NEUROSCIENCE, 2013, 7