Transformer-Based Model for Auditory EEG Decoding

被引:0
作者
Chen, Jiaxin [1 ]
Liu, Yin-Long [1 ]
Feng, Rui [1 ]
Yuan, Jiahong [1 ,2 ]
Ling, Zhen-Hua [1 ,2 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei, Peoples R China
[2] Univ Sci & Technol China, Interdisciplinary Res Ctr Linguist Sci, Hefei, Peoples R China
来源
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2024 | 2025年 / 2312卷
关键词
EEG; speech decoding; Transformer-based models; match-mismatch; regression; SPEECH;
D O I
10.1007/978-981-96-1045-7_11
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
During the process of speech perception, the listener's electroencephalographic(EEG) signals is synchronized with acoustic features such as the speech envelope. This neural tracking mechanism can be used to decode the speech information form the EEG signals, much work has been devoted to investigating. In terms of the limited fitting ability of linear models, many deep learning-based models have been proposed in this field. Recently, Transformer-based models have showed significant potential in the EEG tasks. The Auditory EEG Decoding Challenge 2023 released two tasks to associate a person's EEG signals with the speech they are listening to, namely match-mismatch and regression. In this paper, two Transformer-based models are proposed for the two auditory downstream tasks. The convolution layer and self-attention mechanism are utilized simultaneously to extract both local features and global dependencies. For the match-mismatch task, the Transformer-Dilated Convolution Network is proposed to classify the speech segments that match the EEG segment. Meanwhile, we design the Transformer-Conformer Network to reconstruct the speech envelope for the regression task. Results show that our proposed models outperform the baseline on both tasks. In addition, the Transformer-Conformer Network is superior in performance comparing with all the challenge teams on the regression track.
引用
收藏
页码:129 / 143
页数:15
相关论文
共 30 条
  • [1] Accou B., 2023, BIORXIV, P2023
  • [2] Accou B, 2021, EUR SIGNAL PR CONF, P1175, DOI [10.23919/eusipco47968.2020.9287417, 10.23919/Eusipco47968.2020.9287417]
  • [3] Decoding of the speech envelope from EEG using the VLAAI deep neural network
    Accou, Bernd
    Vanthornhout, Jonas
    Van Hamme, Hugo
    Francart, Tom
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [4] AITST-Affective EEG-based person identification via interrelated transformer
    Cai, Honghua
    Jin, Jiarui
    Wang, Haoyu
    Li, Liujiang
    Huang, Yucui
    Pan, Jiahui
    [J]. PATTERN RECOGNITION LETTERS, 2023, 174 : 32 - 38
  • [5] Chen X., 2023, ICASSP 2023, P1
  • [6] The Multivariate Temporal Response Function (mTRF) Toolbox: A MATLAB Toolbox for Relating Neural Signals to Continuous Stimuli
    Crosse, Michael J.
    Di Liberto, Giovanni M.
    Bednar, Adam
    Lalor, Edmund C.
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2016, 10
  • [7] Dai ZH, 2019, Arxiv, DOI [arXiv:1901.02860, DOI 10.48550/ARXIV.1901.02860]
  • [8] Decoding speech perception from non-invasive brain recordings
    Defossez, Alexandre
    Caucheteux, Charlotte
    Rapin, Jeremy
    Kabeli, Ori
    King, Jean-Remi
    [J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (10) : 1097 - +
  • [9] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [10] Emergence of neural encoding of auditory objects while listening to competing speakers
    Ding, Nai
    Simon, Jonathan Z.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (29) : 11854 - 11859