EEG-based Auditory Attention Detection with Estimated Speech Sources Separated from an Ideal-binary-masking Process

被引:0
作者
Wang, Lei [1 ]
Chen, Fei [1 ]
机构
[1] Southern Univ Sci & Technol, Shenzhen Key Lab Robot Percept & Intelligence, Shenzhen, Peoples R China
来源
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Previous studies showed that auditory attention can be decoded from the corresponding electroencephalography (EEG) signals. Most existing EEG-based auditory attention detection (AAD) methods identify target speech in the competing-speaker scenes by comparing the correlation coefficients between the speech envelope of each clean stream and the speech envelope reconstructed from the EEG signals. The usage of separate speech streams limits the actualization of EEG-based AAD in the realistic environments. The current study aimed to develop and assess an EEG-based AAD method using the estimated speech sources separated from an ideal-binary-masking (IBM) process. Specially, the IBM-based speech processing method was first implemented to separate the speech sources in the competing-speaker scenes. Then the estimated IBM-processed speech sources were used to establish the AAD model and extract the target speech stream. Experimental results demonstrated that the AAD accuracies computed with the estimated IBM-processed speech sources were comparable to those with original clean speech sources over a range of signal-to-masker ratios. These findings indicate that the estimated IBM-processed speech sources provide necessary and sufficient information for the EEG-based AAD methods, which facilitate the extraction of attention-driven target speech streams in noisy environments.
引用
收藏
页码:1545 / 1549
页数:5
相关论文
共 22 条
  • [1] Cognitive-Driven Binaural Beamforming Using EEG-Based Auditory Attention Decoding
    Aroudi, Ali
    Doclo, Simon
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 862 - 875
  • [2] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation
    Brungart, Douglas S.
    Chang, Peter S.
    Simpson, Brian D.
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) : 4007 - 4018
  • [4] Robust automatic speech recognition with missing and unreliable acoustic data
    Cooke, M
    Green, P
    Josifovski, L
    Vizinho, A
    [J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
  • [5] The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli
    Crosse, Michael J.
    Di Liberto, Giovanni M.
    Bednar, Adam
    Lalor, Edmund C.
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2016, 10
  • [6] Das Neetha, 2017, 2017 25th European Signal Processing Conference (EUSIPCO), P1660, DOI 10.23919/EUSIPCO.2017.8081390
  • [7] Linear versus deep learning methods for noisy speech separation for EEG-informed attention decoding
    Das, Neetha
    Zegers, Jeroen
    Van Hamme, Hugo
    Francart, Tom
    Bertrand, Alexander
    [J]. JOURNAL OF NEURAL ENGINEERING, 2020, 17 (04)
  • [8] Machine learning for decoding listeners' attention from electroencephalography evoked by continuous speech
    de Taillez, Tobias
    Kollmeier, Birger
    Meyer, Bernd T.
    [J]. EUROPEAN JOURNAL OF NEUROSCIENCE, 2020, 51 (05) : 1234 - 1241
  • [9] EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis
    Delorme, A
    Makeig, S
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2004, 134 (01) : 9 - 21
  • [10] Emergence of neural encoding of auditory objects while listening to competing speakers
    Ding, Nai
    Simon, Jonathan Z.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (29) : 11854 - 11859