Transformer-Based Model for Auditory EEG Decoding

被引：0

作者：

Chen, Jiaxin ^{[1
]}

Liu, Yin-Long ^{[1
]}

Feng, Rui ^{[1
]}

Yuan, Jiahong ^{[1
,2
]}

Ling, Zhen-Hua ^{[1
,2
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China

[2] Univ Sci & Technol China, Interdisciplinary Res Ctr Linguist Sci, Hefei, Peoples R China

来源：

MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024 | 2025年 / 2312卷

关键词：

EEG; speech decoding; Transformer-based models; match-mismatch; regression; SPEECH;

D O I：

10.1007/978-981-96-1045-7_11

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

During the process of speech perception, the listener's electroencephalographic(EEG) signals is synchronized with acoustic features such as the speech envelope. This neural tracking mechanism can be used to decode the speech information form the EEG signals, much work has been devoted to investigating. In terms of the limited fitting ability of linear models, many deep learning-based models have been proposed in this field. Recently, Transformer-based models have showed significant potential in the EEG tasks. The Auditory EEG Decoding Challenge 2023 released two tasks to associate a person's EEG signals with the speech they are listening to, namely match-mismatch and regression. In this paper, two Transformer-based models are proposed for the two auditory downstream tasks. The convolution layer and self-attention mechanism are utilized simultaneously to extract both local features and global dependencies. For the match-mismatch task, the Transformer-Dilated Convolution Network is proposed to classify the speech segments that match the EEG segment. Meanwhile, we design the Transformer-Conformer Network to reconstruct the speech envelope for the regression task. Results show that our proposed models outperform the baseline on both tasks. In addition, the Transformer-Conformer Network is superior in performance comparing with all the challenge teams on the regression track.

引用

页码：129 / 143

页数：15

共 30 条

[1] Accou B., 2023, BIORXIV, P2023
[2] Accou B, 2021, EUR SIGNAL PR CONF, P1175, DOI [10.23919/eusipco47968.2020.9287417, 10.23919/Eusipco47968.2020.9287417]
[3] Decoding of the speech envelope from EEG using the VLAAI deep neural network
Accou, Bernd
Vanthornhout, Jonas
Van Hamme, Hugo
Francart, Tom
[J]. SCIENTIFIC REPORTS, 2023, 13 (01)
[4] AITST-Affective EEG-based person identification via interrelated transformer
Cai, Honghua
Jin, Jiarui
Wang, Haoyu
Li, Liujiang
Huang, Yucui
Pan, Jiahui
[J]. PATTERN RECOGNITION LETTERS, 2023, 174 : 32 - 38
[5] Chen X., 2023, ICASSP 2023, P1
[6] The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli
Crosse, Michael J.
Di Liberto, Giovanni M.
Bednar, Adam
Lalor, Edmund C.
[J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2016, 10
[7] Dai ZH, 2019, Arxiv, DOI [arXiv:1901.02860, DOI 10.48550/ARXIV.1901.02860]
[8] Decoding speech perception from non-invasive brain recordings
Defossez, Alexandre
Caucheteux, Charlotte
Rapin, Jeremy
Kabeli, Ori
King, Jean-Remi
[J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (10) : 1097 - +
[9] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[10] Emergence of neural encoding of auditory objects while listening to competing speakers
Ding, Nai
Simon, Jonathan Z.
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (29) : 11854 - 11859

← 1 2 3 →