Generalizing neural signal-to-text brain-computer interfaces

被引:4
作者
Sheth, Janaki [1 ]
Tankus, Ariel [2 ,3 ,4 ]
Tran, Michelle [5 ]
Pouratian, Nader [5 ]
Fried, Itzhak [5 ]
Speier, William [6 ]
机构
[1] Univ Calif Los Angeles, Dept Phys & Astron, Los Angeles, CA USA
[2] Tel Aviv Univ, Sagol Sch Neurosci, Tel Aviv, Israel
[3] Sourasky Med Ctr, Funct Neurosurg Unit, Tel Aviv, Israel
[4] Tel Aviv Univ, Sackler Sch Med, Dept Neurol & Neurosurg, Tel Aviv, Israel
[5] Univ Calif Los Angeles, Dept Neurosurg, Los Angeles, CA USA
[6] Univ Calif Los Angeles, Dept Radiol, Los Angeles, CA 90095 USA
关键词
brain-computer interfaces; neural speech recognition; intra-cranial depth electrodes;
D O I
10.1088/2057-1976/abf6ab
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
Objective: Brain-Computer Interfaces (BCI) may help patients with faltering communication abilities due to neurodegenerative diseases produce text or speech by direct neural processing. However, their practical realization has proven difficult due to limitations in speed, accuracy, and generalizability of existing interfaces. The goal of this study is to evaluate the BCI performance of a robust speech decoding system that translates neural signals evoked by speech to a textual output. While previous studies have approached this problem by using neural signals to choose from a limited set of possible words, we employ a more general model that can type any word from a large corpus of English text. Approach: In this study, we create an end-to-end BCI that translates neural signals associated with overt speech into text output. Our decoding system first isolates frequency bands in the input depth-electrode signal encapsulating differential information regarding production of various phonemic classes. These bands form a feature set that then feeds into a Long Short-Term Memory (LSTM) model which discerns at each time point probability distributions across all phonemes uttered by a subject. Finally, a particle filtering algorithm temporally smooths these probabilities by incorporating prior knowledge of the English language to output text corresponding to the decoded word. The generalizability of our decoder is driven by the lack of a vocabulary constraint on this output word. Main result: This method was evaluated using a dataset of 6 neurosurgical patients implanted with intra-cranial depth electrodes to identify seizure foci for potential surgical treatment of epilepsy. We averaged 32% word accuracy and on the phoneme-level obtained 46% precision, 51% recall and 73.32% average phoneme error rate while also achieving significant increases in speed when compared to several other BCI approaches. Significance: Our study employs a more general neural signal-to-text model which could facilitate communication by patients in everyday environments.
引用
收藏
页数:10
相关论文
共 35 条
  • [1] Towards reconstructing intelligible speech from the human auditory cortex
    Akbari, Hassan
    Khalighinejad, Bahar
    Herrero, Jose L.
    Mehta, Ashesh D.
    Mesgarani, Nima
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)
  • [2] [Anonymous], 1979, DEP LINGUISTICS
  • [3] Speech synthesis from neural decoding of spoken sentences
    Anumanchipalli, Gopala K.
    Chartier, Josh
    Chang, Edward F.
    [J]. NATURE, 2019, 568 (7753) : 493 - +
  • [4] Bocquelet Florent, 2016, J Physiol Paris, V110, P392, DOI 10.1016/j.jphysparis.2017.07.002
  • [5] Decoding Speech Perception by Native and Non-Native Speakers Using Single-Trial Electrophysiological Data
    Brandmeyer, Alex
    Farquhar, Jason D. R.
    McQueen, James M.
    Desain, Peter W. M.
    [J]. PLOS ONE, 2013, 8 (07):
  • [6] Brain-Computer Interfaces for Augmentative and Alternative Communication: A Tutorial
    Brumberg, Jonathan S.
    Pitt, Kevin M.
    Mantie-Kozlowski, Alana
    Burnison, Jeremy D.
    [J]. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY, 2018, 27 (01) : 1 - 12
  • [7] Role of left posterior superior temporal gyrus in phonological processing for speech perception and production
    Buchsbaum, BR
    Hickok, G
    Humphries, C
    [J]. COGNITIVE SCIENCE, 2001, 25 (05) : 663 - 678
  • [8] Human cortical sensorimotor network underlying feedback control of vocal pitch
    Chang, Edward F.
    Niziolek, Caroline A.
    Knight, Robert T.
    Nagarajan, Srikantan S.
    Houde, John F.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (07) : 2653 - 2658
  • [9] The hippocampus and the flexible use and processing of language
    Duff, Melissa C.
    Brown-Schmidt, Sarah
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2012, 6
  • [10] TALKING OFF THE TOP OF YOUR HEAD - TOWARD A MENTAL PROSTHESIS UTILIZING EVENT-RELATED BRAIN POTENTIALS
    FARWELL, LA
    DONCHIN, E
    [J]. ELECTROENCEPHALOGRAPHY AND CLINICAL NEUROPHYSIOLOGY, 1988, 70 (06): : 510 - 523