Towards Naturalistic Speech Decoding from Intracranial Brain Data

被引:1
作者
Berezutskaya, Julia [1 ]
Ambrogioni, Luca [1 ]
Ramsey, Nicolas F. [2 ]
van Gerven, Marcel A. J. [1 ]
机构
[1] Radboud Univ Nijmegen, Donders Inst Brain Cognit & Behav, Thomas van Aquinostr 4, NL-6525 GD Nijmegen, Netherlands
[2] Univ Med Ctr Utrecht, Brain Ctr, Dept Neurol & Neurosurg, Heidelberglaan 100, NL-3584 CX Utrecht, Netherlands
来源
2022 44TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC | 2022年
基金
欧洲研究理事会;
关键词
D O I
10.1109/EMBC48229.2022.9871301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech decoding from brain activity can enable development of brain-computer interfaces (BCIs) to restore naturalistic communication in paralyzed patients. Previous work has focused on development of decoding models from isolated speech data with a clean background and multiple repetitions of the material. In this study, we describe a novel approach to speech decoding that relies on a generative adversarial neural network (GAN) to reconstruct speech from brain data recorded during a naturalistic speech listening task (watching a movie). We compared the GAN-based approach, where reconstruction was done from the compressed latent representation of sound decoded from the brain, with several baseline models that reconstructed sound spectrogram directly. We show that the novel approach provides more accurate reconstructions compared to the baselines. These results underscore the potential of GAN models for speech decoding in naturalistic noisy environments and further advancing of BCIs for naturalistic communication.
引用
收藏
页码:3100 / 3104
页数:5
相关论文
共 30 条
[1]   Towards reconstructing intelligible speech from the human auditory cortex [J].
Akbari, Hassan ;
Khalighinejad, Bahar ;
Herrero, Jose L. ;
Mehta, Ashesh D. ;
Mesgarani, Nima .
SCIENTIFIC REPORTS, 2019, 9 (1)
[2]  
Amodei D, 2016, PR MACH LEARN RES, V48
[3]  
Angrick M., 2020, ARXIV
[4]   Speech synthesis from ECoG using densely connected 3D convolutional neural networks [J].
Angrick, Miguel ;
Herff, Christian ;
Mugler, Emily ;
Tate, Matthew C. ;
Slutzky, Marc W. ;
Krusienski, Dean J. ;
Schultz, Tanja .
JOURNAL OF NEURAL ENGINEERING, 2019, 16 (03)
[5]   Speech synthesis from neural decoding of spoken sentences [J].
Anumanchipalli, Gopala K. ;
Chartier, Josh ;
Chang, Edward F. .
NATURE, 2019, 568 (7753) :493-+
[6]   Brain-optimized extraction of complex sound features that drive continuous auditory perception [J].
Berezutskaya, Julia ;
Freudenburg, Zachary V. ;
Guclu, Umut ;
van Gerven, Marcel A. J. ;
Ramsey, Nick F. .
PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (07)
[7]  
Berezutskaya Julia, 2020, Scientific Reports, V10, P1
[8]   Localization and classification of phonemes using high spatial resolution electrocorticography (ECoG) grids [J].
Blakely, Timothy ;
Miller, Kai J. ;
Rao, Rajesh P. N. ;
Holmes, Mark D. ;
Ojemann, Jeffrey G. .
2008 30TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-8, 2008, :4964-+
[9]  
Boersma P., 2001, Glot Int., V5, P341
[10]  
Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)