MEGFormer: Enhancing Speech Decoding from Brain Activity Through Extended Semantic Representations

被引:0
作者
Boyko, Maria [1 ,2 ]
Druzhinina, Polina [1 ,2 ,3 ]
Kormakov, Georgii [1 ]
Beliaeva, Aleksandra [4 ]
Sharaev, Maxim [1 ,2 ]
机构
[1] Skolkovo Inst Sci & Technol, Ctr Appl AI, Moscow, Russia
[2] Univ Sharjah, Biomedically Informed Artificial Intelligence Lab, BIMAI Lab, Sharjah, U Arab Emirates
[3] Artificial Intelligence Res Inst AIRI, Moscow, Russia
[4] Lomonosov Moscow State Univ, Moscow, Russia
来源
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT II | 2024年 / 15002卷
关键词
Decoding speech; Contrastive Learning; Brain-computer interface; CNNtransformer; MEG;
D O I
10.1007/978-3-031-72069-7_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Even though multiple studies have examined the decoding of speech from brain activity through non-invasive technologies in recent years, the task still presents a challenge as decoding quality is still insufficient for practical applications. An effective solution could help in the advancement of brain-computer interfaces (BCIs), potentially enabling communication restoration for individuals experiencing speech impairments. At the same time, these studies can provide fundamental insights into how the brain processes speech and sound. One of the approaches for decoding perceived speech involves using a self-supervised model that has been trained using contrastive learning. This model matches segments of the same length from magnetoencephalography (MEG) to audio in a zero-shot way. We improve the method for decoding perceived speech by incorporating a new architecture based on CNN transformer. As a result of proposed modifications, the accuracy of perceived speech decoding increases significantly from the current 69% to 83% and from 67% to 70% on publicly available datasets. Notably, the greatest improvement in accuracy is observed in longer speech fragments that carry semantic meaning, rather than in shorter fragments with sounds and phonemes. Our code is available at https://github.com/maryjis/MEGformer/.
引用
收藏
页码:281 / 290
页数:10
相关论文
共 15 条
  • [1] Speech synthesis from neural decoding of spoken sentences
    Anumanchipalli, Gopala K.
    Chartier, Josh
    Chang, Edward F.
    [J]. NATURE, 2019, 568 (7753) : 493 - +
  • [2] Baevski A, 2020, ADV NEUR IN, V33
  • [3] Neuropsychological and neurophysiological aspects of brain-computer-interface (BCI) control in paralysis
    Chaudhary, Ujwal
    Mrachacz-Kersting, Natalie
    Birbaumer, Niels
    [J]. JOURNAL OF PHYSIOLOGY-LONDON, 2021, 599 (09): : 2351 - 2359
  • [4] Decoding speech perception from non-invasive brain recordings
    Defossez, Alexandre
    Caucheteux, Charlotte
    Rapin, Jeremy
    Kabeli, Ori
    King, Jean-Remi
    [J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (10) : 1097 - +
  • [5] Introducing MEG-MASC a high-quality magneto-encephalography dataset for evaluating natural speech processing
    Gwilliams, Laura
    Flick, Graham
    Marantz, Alec
    Pylkkanen, Liina
    Poeppel, David
    King, Jean-Remi
    [J]. SCIENTIFIC DATA, 2023, 10 (01)
  • [6] Haci D, 2020, IEEE LAT AMER SYMP
  • [7] Emotional language processing in autism spectrum disorders: a systematic review
    Lartseva, Alina
    Dijkstra, Ton
    Buitelaar, Jan K.
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2015, 8
  • [8] A high-performance neuroprosthesis for speech decoding and avatar control
    Metzger, Sean L.
    Littlejohn, Kaylo T.
    Silva, Alexander B.
    Moses, David A.
    Seaton, Margaret P.
    Wang, Ran
    Dougherty, Maximilian E.
    Liu, Jessie R.
    Wu, Peter
    Berger, Michael A.
    Zhuravleva, Inga
    Tu-Chan, Adelyn
    Ganguly, Karunesh
    Anumanchipalli, Gopala K.
    Chang, Edward F.
    [J]. NATURE, 2023, 620 (7976) : 1037 - +
  • [9] Millet J, 2022, ADV NEUR IN
  • [10] Enhanced average for event-related potential analysis using dynamic time warping
    Molina, Mario
    Tardon, Lorenzo J.
    Barbancho, Ana M.
    De-Torres, Irene
    Barbancho, Isabel
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 87