Speech synthesis from intracranial stereotactic Electroencephalography using a neural vocoder

被引:2
作者
Arthur, Frigyes Viktor [1 ]
Csapo, Tamas Gabor [1 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat BME, Budapest, Hungary
来源
INFOCOMMUNICATIONS JOURNAL | 2024年 / 16卷 / 01期
关键词
human; computer interaction; SEEG; BCI; brain-computer interface; WAVEGLOW; EEG;
D O I
10.36244/ICJ.2024.1.6
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
speech is one of the most important human biosignals. However, only some speech production characteristics are fully understood, which are required for a successful speech- based Brain-Computer Interface (BCI). A proper brain-to- speech system that can generate the speech of full sentences intelligibly and naturally poses a great challenge. In our study, we used the Single WordProduction-Dutch-iBIDS dataset, in which speech and intracranial stereotactic electroencephalography (sEEG) signals of the brain were recorded simultaneously during a single word production task. We apply deep neural networks (FC-DNN, 2D-CNN, and 3D-CNN) on the ten speakers' data for sEEG-to-Mel spectrogram prediction. Next, we synthesize speech using the WaveGlow neural vocoder. Our objective and subjective evaluations have shown that the DNN- based approaches with neural vocoder outperform the baseline linear regression model using Griffin-Lim. The synthesized samples resemble the original speech but are still not intelligible, and the results are clearly speaker dependent. In the long term, speech-based BCI applications might be useful for the speaking impaired or those having neurological disorders.
引用
收藏
页码:47 / 55
页数:9
相关论文
共 45 条
[1]   Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity [J].
Angrick, Miguel ;
Ottenhoff, Maarten C. ;
Diener, Lorenz ;
Ivucic, Darius ;
Ivucic, Gabriel ;
Goulis, Sophocles ;
Saal, Jeremy ;
Colon, Albert J. ;
Wagner, Louis ;
Krusienski, Dean J. ;
Kubben, Pieter L. ;
Schultz, Tanja ;
Herff, Christian .
COMMUNICATIONS BIOLOGY, 2021, 4 (01)
[2]  
[Anonymous], 2001, ITU R RECOMMENDATION
[3]  
[Anonymous], 1978, The Fourier Transform and Its Applications
[4]  
[Anonymous], 2015, Wiley Encyclopedia of Electrical and Electronics Engineering, DOI DOI 10.1002/047134608X.W8278
[5]   Speech synthesis from neural decoding of spoken sentences [J].
Anumanchipalli, Gopala K. ;
Chartier, Josh ;
Chang, Edward F. .
NATURE, 2019, 568 (7753) :493-+
[6]  
Arthur F. V., 2022, 18 MAG SZAM NYELV K, P185
[7]   Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex [J].
Brumberg, Jonathan S. ;
Wright, E. Joe ;
Andreasen, Dinal S. ;
Guenther, Frank H. ;
Kennedy, Philip R. .
FRONTIERS IN NEUROSCIENCE, 2011, 5 :1-12
[8]   The origin of extracellular fields and currents - EEG, ECoG, LFP and spikes [J].
Buzsaki, Gyoergy ;
Anastassiou, Costas A. ;
Koch, Christof .
NATURE REVIEWS NEUROSCIENCE, 2012, 13 (06) :407-420
[9]   Speaker Adaptation on Articulation and Acoustics for Articulation-to-Speech Synthesis [J].
Cao, Beiming ;
Wisler, Alan ;
Wang, Jun .
SENSORS, 2022, 22 (16)
[10]   Wearable EEG and beyond [J].
Casson, Alexander J. .
BIOMEDICAL ENGINEERING LETTERS, 2019, 9 (01) :53-71