Direct speech reconstruction from sensorimotor brain activity with optimized deep learning models

被引:12
作者
Berezutskaya, Julia [1 ,2 ]
Freudenburg, Zachary, V [1 ]
Vansteensel, Mariska J. [1 ]
Aarnoutse, Erik J. [1 ]
Ramsey, Nick F. [1 ]
van Gerven, Marcel A. J. [2 ]
机构
[1] Univ Med Ctr Utrecht, Brain Ctr, Dept Neurol & Neurosurg, NL-3584 CX Utrecht, Netherlands
[2] Donders Ctr Brain Cognit & Behav, NL-6525 GD Nijmegen, Netherlands
基金
欧洲研究理事会; 美国国家卫生研究院;
关键词
brain; speech; deep neural networks; brain-computer interfaces; electrocorticography; audio reconstruction; neural decoding; COMMUNICATION; LOCALIZATION; NETWORKS;
D O I
10.1088/1741-2552/ace8be
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Objective. Development of brain-computer interface (BCI) technology is key for enabling communication in individuals who have lost the faculty of speech due to severe motor paralysis. A BCI control strategy that is gaining attention employs speech decoding from neural data. Recent studies have shown that a combination of direct neural recordings and advanced computational models can provide promising results. Understanding which decoding strategies deliver best and directly applicable results is crucial for advancing the field. Approach. In this paper, we optimized and validated a decoding approach based on speech reconstruction directly from high-density electrocorticography recordings from sensorimotor cortex during a speech production task. Main results. We show that (1) dedicated machine learning optimization of reconstruction models is key for achieving the best reconstruction performance; (2) individual word decoding in reconstructed speech achieves 92%-100% accuracy (chance level is 8%); (3) direct reconstruction from sensorimotor brain activity produces intelligible speech. Significance. These results underline the need for model optimization in achieving best speech decoding results and highlight the potential that reconstruction-based speech decoding from sensorimotor cortex can offer for development of next-generation BCI technology for communication.
引用
收藏
页数:24
相关论文
共 83 条
[71]   A scale for the measurement of the psychological magnitude pitch [J].
Stevens, SS ;
Volkmann, J ;
Newman, EB .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1937, 8 (03) :185-190
[72]   Brain2Char: a deep architecture for decoding text from brain recordings [J].
Sun, Pengfei ;
Anumanchipalli, Gopala K. ;
Chang, Edward F. .
JOURNAL OF NEURAL ENGINEERING, 2020, 17 (06)
[73]  
Sutskever I, 2014, ADV NEUR IN, V27
[74]   A SHORT-TIME OBJECTIVE INTELLIGIBILITY MEASURE FOR TIME-FREQUENCY WEIGHTED NOISY SPEECH [J].
Taal, Cees H. ;
Hendriks, Richard C. ;
Heusdens, Richard ;
Jensen, Jesper .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4214-4217
[75]   Structured neuronal encoding and decoding of human speech features [J].
Tankus, Ariel ;
Fried, Itzhak ;
Shoham, Shy .
NATURE COMMUNICATIONS, 2012, 3
[76]   Fully Implanted Brain-Computer Interface in a Locked-In Patient with ALS [J].
Vansteensel, Mariska J. ;
Pels, Elmar G. M. ;
Bleichner, Martin G. ;
Branco, Mariana P. ;
Denison, Timothy ;
Freudenburg, Zachary V. ;
Gosselaar, Peter ;
Leinders, Sacha ;
Ottens, Thomas H. ;
Van den Boom, Max A. ;
Van Rijen, Peter C. ;
Aarnoutse, Erik J. ;
Ramsey, Nick F. .
NEW ENGLAND JOURNAL OF MEDICINE, 2016, 375 (21) :2060-2066
[77]   Synthesizing Speech by Decoding Intracortical Neural Activity from Dorsal Motor Cortex [J].
Wairagkar, Maitreyee ;
Hochberg, Leigh R. ;
Brandman, David M. ;
Stavisky, Sergey D. .
2023 11TH INTERNATIONAL IEEE/EMBS CONFERENCE ON NEURAL ENGINEERING, NER, 2023,
[78]   Decoding grasp and speech signals from the cortical grasp circuit in a tetraplegic human [J].
Wandelt, Sarah K. ;
Kellis, Spencer ;
Bjanes, David A. ;
Pejsa, Kelsie ;
Lee, Brian ;
Liu, Charles ;
Andersen, Richard A. .
NEURON, 2022, 110 (11) :1777-+
[79]  
Wang R, 2020, I S BIOMED IMAGING, P390, DOI [10.1109/isbi45749.2020.9098589, 10.1109/ISBI45749.2020.9098589]
[80]  
Wang W, 2011, IEEE ENG MED BIO, P6294, DOI 10.1109/IEMBS.2011.6091553