An Interpretable Deep Learning Model for Speech Activity Detection Using Electrocorticographic Signals

被引：6

作者：

Stuart, Morgan ^{[1
]}

Lesaja, Srdjan ^{[2
]}

Shih, Jerry J. ^{[3
]}

Schultz, Tanja ^{[4
]}

Manic, Milos ^{[1
]}

Krusienski, Dean J. ^{[2
]}

机构：

[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA

[2] Virginia Commonwealth Univ, Dept Biomed Engn, Richmond, VA 23284 USA

[3] UCSD Hlth, Neurol Dept, San Diego, CA 92093 USA

[4] Univ Bremen, Cognit Syst Lab, D-28359 Bremen, Germany

来源：

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING | 2022年 / 30卷

关键词：

Electrodes; Brain modeling; Band-pass filters; Decoding; Convolution; Computer architecture; Deep learning; Brain-Computer Interfaces (BCIs); deep learning; electroencephalography;

D O I：

10.1109/TNSRE.2022.3207624

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Numerous state-of-the-art solutions for neural speech decoding and synthesis incorporate deep learning into the processing pipeline. These models are typically opaque and can require significant computational resources for training and execution. A deep learning architecture is presented that learns input bandpass filters that capture task-relevant spectral features directly from data. Incorporating such explainable feature extraction into the model furthers the goal of creating end-to-end architectures that enable automated subject-specific parameter tuning while yielding an interpretable result. The model is implemented using intracranial brain data collected during a speech task. Using raw, unprocessed timesamples, the model detects the presence of speech at every timesample in a causal manner, suitable for online application. Model performance is comparable or superior to existing approaches that require substantial signal preprocessing and the learned frequency bands were found to converge to ranges that are supported by previous studies.

引用

页码：2783 / 2792

页数：10

共 47 条

[1] Real-time synthesis of imagined speech processes from minimally invasive recordings of neural activity
Angrick, Miguel
Ottenhoff, Maarten C.
Diener, Lorenz
Ivucic, Darius
Ivucic, Gabriel
Goulis, Sophocles
Saal, Jeremy
Colon, Albert J.
Wagner, Louis
Krusienski, Dean J.
Kubben, Pieter L.
Schultz, Tanja
Herff, Christian
[J]. COMMUNICATIONS BIOLOGY, 2021, 4 (01)
[2] Speech synthesis from ECoG using densely connected 3D convolutional neural networks
Angrick, Miguel
Herff, Christian
Mugler, Emily
Tate, Matthew C.
Slutzky, Marc W.
Krusienski, Dean J.
Schultz, Tanja
[J]. JOURNAL OF NEURAL ENGINEERING, 2019, 16 (03)
[3] [Anonymous], 2020, PANDAS DEVPANDAS PAN, DOI [10.5281/zenodo.3509134, DOI 10.5281/ZENODO.3509134]
[4] Speech synthesis from neural decoding of spoken sentences
Anumanchipalli, Gopala K.
Chartier, Josh
Chang, Edward F.
[J]. NATURE, 2019, 568 (7753) : 493 - +
[5] Functional organization of human sensorimotor cortex for speech articulation
Bouchard, Kristofer E.
Mesgarani, Nima
Johnson, Keith
Chang, Edward F.
[J]. NATURE, 2013, 495 (7441) : 327 - 332
[6] Functional and Quantitative MRI Mapping of Somatomotor Representations of Human Supralaryngeal Vocal Tract
Carey, Daniel
Krishnan, Saloni
Callaghan, Martina F.
Sereno, Martin I.
Dick, Frederic
[J]. CEREBRAL CORTEX, 2017, 27 (01) : 265 - 278
[7] Progress in speech decoding from the electrocorticogram
Chakrabarti S.
Sandberg H.M.
Brumberg J.S.
Krusienski D.J.
[J]. Biomedical Engineering Letters, 2015, 5 (01) : 10 - 21
[8] Encoding of Articulatory Kinematic Trajectories in Human Speech Sensorimotor Cortex
Chartier, Josh
Anumanchipalli, Gopala K.
Johnson, Keith
Chang, Edward F.
[J]. NEURON, 2018, 98 (05) : 1042 - +
[9] Can the right hemisphere speak?
Code, C
[J]. BRAIN AND LANGUAGE, 1997, 57 (01) : 38 - 59
[10] Epileptic Spike Detection Using Neural Networks With Linear-Phase Convolutions
Fukumori, Kosuke
Yoshida, Noboru
Sugano, Hidenori
Nakajima, Madoka
Tanaka, Toshihisa
[J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (03) : 1045 - 1056

← 1 2 3 4 5 →