Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models

被引:12
作者
Berezutskaya, Julia [1 ,2 ]
Freudenburg, Zachary, V [1 ]
Vansteensel, Mariska J. [1 ]
Aarnoutse, Erik J. [1 ]
Ramsey, Nick F. [1 ]
van Gerven, Marcel A. J. [2 ]
机构
[1] Univ Med Ctr Utrecht, Brain Ctr, Dept Neurol & Neurosurg, NL-3584 CX Utrecht, Netherlands
[2] Donders Ctr Brain Cognit & Behav, NL-6525 GD Nijmegen, Netherlands
基金
欧洲研究理事会; 美国国家卫生研究院;
关键词
brain; speech; deep neural networks; brain-computer interfaces; electrocorticography; audio reconstruction; neural decoding; COMMUNICATION; LOCALIZATION; NETWORKS;
D O I
10.1088/1741-2552/ace8be
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Objective. Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field. Approach. In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task. Main results. We show that (1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; (2) individual word decoding in reconstructed speech achieves 92%-100% accuracy (chance level is 8%); (3) direct reconstruction from sensorimotor brain activity produces intelligible speech. Significance. These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
引用
收藏
页数:24
相关论文
共 83 条
[1]   Towards reconstructing intelligible speech from the human auditory cortex [J].
Akbari, Hassan ;
Khalighinejad, Bahar ;
Herrero, Jose L. ;
Mehta, Ashesh D. ;
Mesgarani, Nima .
SCIENTIFIC REPORTS, 2019, 9 (1)
[2]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[3]   Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity [J].
Angrick, Miguel ;
Ottenhoff, Maarten C. ;
Diener, Lorenz ;
Ivucic, Darius ;
Ivucic, Gabriel ;
Goulis, Sophocles ;
Saal, Jeremy ;
Colon, Albert J. ;
Wagner, Louis ;
Krusienski, Dean J. ;
Kubben, Pieter L. ;
Schultz, Tanja ;
Herff, Christian .
COMMUNICATIONS BIOLOGY, 2021, 4 (01)
[4]   Speech synthesis from ECoG using densely connected 3D convolutional neural networks [J].
Angrick, Miguel ;
Herff, Christian ;
Mugler, Emily ;
Tate, Matthew C. ;
Slutzky, Marc W. ;
Krusienski, Dean J. ;
Schultz, Tanja .
JOURNAL OF NEURAL ENGINEERING, 2019, 16 (03)
[5]   Speech synthesis from neural decoding of spoken sentences [J].
Anumanchipalli, Gopala K. ;
Chartier, Josh ;
Chang, Edward F. .
NATURE, 2019, 568 (7753) :493-+
[6]  
Berezutskaya J., 2023, Behavioral assessment of the quality of speech reconstructions from intracranial neural activity Version 1, DOI [10.34973/c07k-v019, DOI 10.34973/C07K-V019]
[7]  
Berezutskaya J, 2022, Arxiv, DOI arXiv:2207.13190
[8]   Towards Naturalistic Speech Decoding from Intracranial Brain Data [J].
Berezutskaya, Julia ;
Ambrogioni, Luca ;
Ramsey, Nicolas F. ;
van Gerven, Marcel A. J. .
2022 44TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2022, :3100-3104
[9]   High-density intracranial recordings reveal a distinct site in anterior dorsal precentral cortex that tracks perceived speech [J].
Berezutskaya, Julia ;
Baratin, Clarissa ;
Freudenburg, Zachary, V ;
Ramsey, Nicolas F. .
HUMAN BRAIN MAPPING, 2020, 41 (16) :4587-4609
[10]   Brain-optimized extraction of complex sound features that drive continuous auditory perception [J].
Berezutskaya, Julia ;
Freudenburg, Zachary V. ;
Guclu, Umut ;
van Gerven, Marcel A. J. ;
Ramsey, Nick F. .
PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (07)