Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space

被引:6
作者
Li, Kan [1 ]
Principe, Jose C. [1 ]
机构
[1] Univ Florida, Dept Elect & Comp Engn, Computat NeuroEngn Lab, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
spike-based learning; noise-robust automatic speech recognition (ASR); keyword spotting; kernel adaptive filtering (KAF); reproducing kernel Hilbert space (RKHS); kernel method; neuromorphic computation; WORD RECOGNITION; MARKOV-MODELS; TIME-SERIES; COMPUTATION; FRAMEWORK; FEATURES; FILTER;
D O I
10.3389/fnins.2018.00194
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.
引用
收藏
页数:17
相关论文
共 56 条
[1]  
[Anonymous], THESIS
[2]  
[Anonymous], 2001, Learning with Kernels |
[3]  
[Anonymous], 1987, APPL PSYCHOL
[4]  
[Anonymous], 2002, SPIKING NEURON MODEL
[5]  
[Anonymous], 2005, P SPS DARTS 2005 ANT
[6]  
Bakis R., 1976, P ASA M
[7]   A MAXIMIZATION TECHNIQUE OCCURRING IN STATISTICAL ANALYSIS OF PROBABILISTIC FUNCTIONS OF MARKOV CHAINS [J].
BAUM, LE ;
PETRIE, T ;
SOULES, G ;
WEISS, N .
ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01) :164-&
[8]   A neural probabilistic language model [J].
Bengio, Y ;
Ducharme, R ;
Vincent, P ;
Jauvin, C .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (06) :1137-1155
[9]   Real-time computation at the edge of chaos in recurrent neural networks [J].
Bertschinger, N ;
Natschläger, T .
NEURAL COMPUTATION, 2004, 16 (07) :1413-1436
[10]   CONTINUOUS SPEECH RECOGNITION BY CONNECTIONIST STATISTICAL-METHODS [J].
BOURLARD, H ;
MORGAN, N .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1993, 4 (06) :893-909