Brain-Computer Interface: Applications to Speech Decoding and Synthesis to Augment Communication

被引:46
作者
Luo, Shiyu [1 ]
Rabbani, Qinwan [2 ]
Crone, Nathan E. [3 ]
机构
[1] Johns Hopkins Univ, Sch Med, Dept Biomed Engn, Baltimore, MD 21205 USA
[2] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA
[3] Johns Hopkins Univ, Sch Med, Dept Neurol, Baltimore, MD 21205 USA
基金
美国国家卫生研究院;
关键词
Speech synthesis; Brain-computer interface; Locked-in syndrome; Electrocorticography; ECoG; ALTERED AUDITORY-FEEDBACK; MOTOR CONTROL; GAMMA-OSCILLATIONS; ELECTROCORTICOGRAPHY; CLASSIFICATION; VARIABILITY; SPEAKERS; HEARING; SYSTEM;
D O I
10.1007/s13311-022-01190-2
中图分类号
R74 [神经病学与精神病学];
学科分类号
摘要
Damage or degeneration of motor pathways necessary for speech and other movements, as in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient communication without affecting brain structures responsible for language or cognition. In the worst-case scenario, this can result in the locked in syndrome (LIS), a condition in which individuals cannot initiate communication and can only express themselves by answering yes/no questions with eye blinks or other rudimentary movements. Existing augmentative and alternative communication (AAC) devices that rely on eye tracking can improve the quality of life for people with this condition, but brain-computer interfaces (BCIs) are also increasingly being investigated as AAC devices, particularly when eye tracking is too slow or unreliable. Moreover, with recent and ongoing advances in machine learning and neural recording technologies, BCIs may offer the only means to go beyond cursor control and text generation on a computer, to allow real-time synthesis of speech, which would arguably offer the most efficient and expressive channel for communication. The potential for BCI speech synthesis has only recently been realized because of seminal studies of the neuroanatomical and neurophysiological underpinnings of speech production using intracranial electrocorticographic (ECoG) recordings in patients undergoing epilepsy surgery. These studies have shown that cortical areas responsible for vocalization and articulation are distributed over a large area of ventral sensorimotor cortex, and that it is possible to decode speech and reconstruct its acoustics from ECoG if these areas are recorded with sufficiently dense and comprehensive electrode arrays. In this article, we review these advances, including the latest neural decoding strategies that range from deep learning models to the direct concatenation of speech units. We also discuss state-of-the-art vocoders that are integral in constructing natural-sounding audio waveforms for speech BCIs. Finally, this review outlines some of the challenges ahead in directly synthesizing speech for patients with LIS.
引用
收藏
页码:263 / 273
页数:11
相关论文
共 106 条
[1]   A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis [J].
Airaksinen, Manu ;
Juvela, Lauri ;
Bollepalli, Bajibabu ;
Yamagishi, Junichi ;
Alku, Paavo .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) :1658-1670
[2]  
Akbari H, 2019, SCI REP-UK
[3]   Speech Spectrogram Estimation from Intracranial Brain Activity using a Quantization Approach [J].
Angrick, Miguel ;
Herff, Christian ;
Johnson, Garett ;
Shih, Jerry ;
Krusienski, Dean ;
Schultz, Tanja .
INTERSPEECH 2020, 2020, :2777-2781
[4]   Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity [J].
Angrick, Miguel ;
Ottenhoff, Maarten C. ;
Diener, Lorenz ;
Ivucic, Darius ;
Ivucic, Gabriel ;
Goulis, Sophocles ;
Saal, Jeremy ;
Colon, Albert J. ;
Wagner, Louis ;
Krusienski, Dean J. ;
Kubben, Pieter L. ;
Schultz, Tanja ;
Herff, Christian .
COMMUNICATIONS BIOLOGY, 2021, 4 (01)
[5]   Speech synthesis from ECoG using densely connected 3D convolutional neural networks [J].
Angrick, Miguel ;
Herff, Christian ;
Mugler, Emily ;
Tate, Matthew C. ;
Slutzky, Marc W. ;
Krusienski, Dean J. ;
Schultz, Tanja .
JOURNAL OF NEURAL ENGINEERING, 2019, 16 (03)
[6]   Effects of altered auditory feedback (AAF) on stuttering frequency during monologue speech production [J].
Antipova, Elena A. ;
Purdy, Suzanne C. ;
Blakeley, Marjorie ;
Williams, Shelley .
JOURNAL OF FLUENCY DISORDERS, 2008, 33 (04) :274-290
[7]   Speech synthesis from neural decoding of spoken sentences [J].
Anumanchipalli, Gopala K. ;
Chartier, Josh ;
Chang, Edward F. .
NATURE, 2019, 568 (7753) :493-+
[8]  
BAUER G, 1979, J NEUROL, V221, P77, DOI 10.1007/BF00313105
[9]   An exoskeleton controlled by an epidural wireless brain-machine interface in a tetraplegic patient: a proof-of-concept demonstration [J].
Benabid, Alim Louis ;
Costecalde, Thomas ;
Eliseyev, Andrey ;
Charvet, Guillaume ;
Verney, Alexandre ;
Karakas, Serpil ;
Foerster, Michael ;
Lambert, Aurelien ;
Moriniere, Boris ;
Abroug, Neil ;
Schaeffer, Marie-Caroline ;
Moly, Alexandre ;
Sauter-Starace, Fabien ;
Ratel, David ;
Moro, Cecile ;
Torres-Martinez, Napoleon ;
Langar, Lilia ;
Oddoux, Manuela ;
Polosan, Mircea ;
Pezzani, Stephane ;
Auboiroux, Vincent ;
Aksenova, Tetiana ;
Mestais, Corinne ;
Chabardes, Stephan .
LANCET NEUROLOGY, 2019, 18 (12) :1112-1122
[10]   Automatic speech recognition and speech variability: A review [J].
Benzeghiba, M. ;
De Mori, R. ;
Deroo, O. ;
Dupont, S. ;
Erbes, T. ;
Jouvet, D. ;
Fissore, L. ;
Laface, P. ;
Mertins, A. ;
Ris, C. ;
Rose, R. ;
Tyagi, V. ;
Wellekens, C. .
SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786