Spike-based encoding and learning of spectrum features for robust sound recognition

被引:16
作者
Xiao, Rong [1 ]
Tang, Huajin [1 ]
Gu, Pengjie [1 ]
Xu, Xiaoliang [2 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Neuromorph Comp Res Ctr, Chengdu, Sichuan, Peoples R China
[2] Hangzhou Dianzi Univ, Coll Comp Sci, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Temporal coding; Temporal learning; Time-frequency information; Spiking neural network; Sound recognition; NEURON; CODE; CLASSIFICATION; OSCILLATIONS; NETWORKS;
D O I
10.1016/j.neucom.2018.06.022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Biological evidence suggests that local time-frequency (LTF) information can be utilized to improve the recognition rate of sounds in the presence of noise. However, most of conventional methods use stationary (frequency-based) features which are not robust to noise, as each stationary feature contains a mixture of spectral information from both noise and signal. This paper proposes a spike-timing based model to encode and learn the LTF features extracted from sound spectrogram using spiking neural networks (SNNs), named LTF-SNN. In this model, we encode the reliable LTF features into spike train patterns and train with different spike-based learning rules. We analyze the efficacy of the spike-based feature encoding method and the recognition performance of the model by using two classes of SNN learning algorithms: ReSuMe and Tempotron. Utilizing the temporal coding and learning, networks of spiking neurons can effectively perform robust sound recognition tasks. Experimental results demonstrate that the model achieves superior performance in mismatched conditions compared with benchmark approaches. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:65 / 73
页数:9
相关论文
共 43 条
[1]   How Do Humans Process and Recognize Speech? [J].
Allen, Jont B. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :567-577
[2]  
[Anonymous], 2005, RESUME NEW SUPERVISE
[3]   Simple networks for spike-timing-based computation, with application to olfactory processing [J].
Brody, CD ;
Hopfield, JJ .
NEURON, 2003, 37 (05) :843-852
[4]   Environmental Sound Recognition With Time-Frequency Audio Features [J].
Chu, Selina ;
Narayanan, Shrikanth ;
Kuo, C. -C. Jay .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06) :1142-1158
[5]   Comparison of techniques for environmental sound recognition [J].
Cowling, M ;
Sitte, R .
PATTERN RECOGNITION LETTERS, 2003, 24 (15) :2895-2907
[6]   Face identification using one spike per neuron: resistance to image degradations [J].
Delorme, A ;
Thorpe, SJ .
NEURAL NETWORKS, 2001, 14 (6-7) :795-803
[7]  
Dennis J, 2013, INT CONF ACOUST SPEE, P803, DOI 10.1109/ICASSP.2013.6637759
[8]  
Gerstner W., 2002, POPULATIONS PLASTICI
[9]   To spike, or when to spike? [J].
Guetig, Robert .
CURRENT OPINION IN NEUROBIOLOGY, 2014, 25 :134-139
[10]   The tempotron:: a neuron that learns spike timing-based decisions [J].
Gütig, R ;
Sompolinsky, H .
NATURE NEUROSCIENCE, 2006, 9 (03) :420-428