EEG-based Auditory Attention Detection with Estimated Speech Sources Separated from an Ideal-binary-masking Process

被引：0

作者：

Wang, Lei ^{[1
]}

Chen, Fei ^{[1
]}

机构：

[1] Southern Univ Sci & Technol, Shenzhen Key Lab Robot Percept & Intelligence, Shenzhen, Peoples R China

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Previous studies showed that auditory attention can be decoded from the corresponding electroencephalography (EEG) signals. Most existing EEG-based auditory attention detection (AAD) methods identify target speech in the competing-speaker scenes by comparing the correlation coefficients between the speech envelope of each clean stream and the speech envelope reconstructed from the EEG signals. The usage of separate speech streams limits the actualization of EEG-based AAD in the realistic environments. The current study aimed to develop and assess an EEG-based AAD method using the estimated speech sources separated from an ideal-binary-masking (IBM) process. Specially, the IBM-based speech processing method was first implemented to separate the speech sources in the competing-speaker scenes. Then the estimated IBM-processed speech sources were used to establish the AAD model and extract the target speech stream. Experimental results demonstrated that the AAD accuracies computed with the estimated IBM-processed speech sources were comparable to those with original clean speech sources over a range of signal-to-masker ratios. These findings indicate that the estimated IBM-processed speech sources provide necessary and sufficient information for the EEG-based AAD methods, which facilitate the extraction of attention-driven target speech streams in noisy environments.

引用

页码：1545 / 1549

页数：5

共 22 条

[1] Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding
Aroudi, Ali
Doclo, Simon
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 862 - 875
[2] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation
Brungart, Douglas S.
Chang, Peter S.
Simpson, Brian D.
Wang, DeLiang
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) : 4007 - 4018
[3] SOME EXPERIMENTS ON THE RECOGNITION OF SPEECH, WITH ONE AND WITH 2 EARS
CHERRY, EC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (05) : 975 - 979
[4] Robust automatic speech recognition with missing and unreliable acoustic data
Cooke, M
Green, P
Josifovski, L
Vizinho, A
[J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
[5] The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli
Crosse, Michael J.
Di Liberto, Giovanni M.
Bednar, Adam
Lalor, Edmund C.
[J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2016, 10
[6] Das Neetha, 2017, 2017 25th European Signal Processing Conference (EUSIPCO), P1660, DOI 10.23919/EUSIPCO.2017.8081390
[7] Linear versus deep learning methods for noisy speech separation for EEG-informed attention decoding
Das, Neetha
Zegers, Jeroen
Van Hamme, Hugo
Francart, Tom
Bertrand, Alexander
[J]. JOURNAL OF NEURAL ENGINEERING, 2020, 17 (04)
[8] Machine learning for decoding listeners' attention from electroencephalography evoked by continuous speech
de Taillez, Tobias
Kollmeier, Birger
Meyer, Bernd T.
[J]. EUROPEAN JOURNAL OF NEUROSCIENCE, 2020, 51 (05) : 1234 - 1241
[9] EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
Delorme, A
Makeig, S
[J]. JOURNAL OF NEUROSCIENCE METHODS, 2004, 134 (01) : 9 - 21
[10] Emergence of neural encoding of auditory objects while listening to competing speakers
Ding, Nai
Simon, Jonathan Z.
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (29) : 11854 - 11859

← 1 2 3 →