Robust Environmental Sound Recognition With Sparse Key-Point Encoding and Efficient Multispike Learning

被引：14

作者：

Yu, Qiang ^{[1
]}

Yao, Yanli ^{[1
]}

Wang, Longbiao ^{[1
]}

Tang, Huajin ^{[2
]}

Dang, Jianwu ^{[1
]}

Tan, Kay Chen ^{[3
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin 300350, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 610065, Peoples R China

[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Encoding; Task analysis; Hidden Markov models; Neurons; Biological neural networks; Mel frequency cepstral coefficient; Biological information theory; Brain-like processing; feature extraction; multispike learning; neuromorphic computing; robust sound recognition; spike encoding; spiking neural networks (SNNs); AUTOMATIC SPEECH RECOGNITION; EVENT CLASSIFICATION; FEATURES; NETWORKS; NEURON;

D O I：

10.1109/TNNLS.2020.2978764

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The capability for environmental sound recognition (ESR) can determine the fitness of individuals in a way to avoid dangers or pursue opportunities when critical sound events occur. It still remains mysterious about the fundamental principles of biological systems that result in such a remarkable ability. Additionally, the practical importance of ESR has attracted an increasing amount of research attention, but the chaotic and nonstationary difficulties continue to make it a challenging task. In this article, we propose a spike-based framework from a more brain-like perspective for the ESR task. Our framework is a unifying system with consistent integration of three major functional parts which are sparse encoding, efficient learning, and robust readout. We first introduce a simple sparse encoding, where key points are used for feature representation, and demonstrate its generalization to both spike- and nonspike-based systems. Then, we evaluate the learning properties of different learning rules in detail with our contributions being added for improvements. Our results highlight the advantages of multispike learning, providing a selection reference for various spike-based developments. Finally, we combine the multispike readout with the other parts to form a system for ESR. Experimental results show that our framework performs the best as compared to other baseline approaches. In addition, we show that our spike-based framework has several advantageous characteristics including early decision making, small dataset acquiring, and ongoing dynamic processing. Our framework is the first attempt to apply the multispike characteristic of nervous neurons to ESR. The outstanding performance of our approach would potentially contribute to draw more research efforts to push the boundaries of spike-based paradigm to a new horizon.

引用

页码：625 / 638

页数：14

共 14 条

[1] Temporal Encoding and Multispike Learning Framework for Efficient Recognition of Visual Patterns
Yu, Qiang
Song, Shiming
Ma, Chenxiang
Wei, Jianguo
Chen, Shengyong
Tan, Kay Chen
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3387 - 3399
[2] Spike-based encoding and learning of spectrum features for robust sound recognition
Xiao, Rong
Tang, Huajin
Gu, Pengjie
Xu, Xiaoliang
NEUROCOMPUTING, 2018, 313 : 65 - 73
[3] Modified SIFT descriptor and key-point matching for fast and robust image mosaic
何玉青
王雪
王思远
刘明奇
诸加丹
金伟其
Journal of Beijing Institute of Technology, 2016, 25 (04) : 562 - 570
[4] Robust descriptor for key-point detection and matching in color images with radial distortion
Zou, Zesen
Wang, Rui
Zou, Jialing
Huang, Ran
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (02)
[5] A Spiking Neural Network with Distributed Keypoint Encoding for Robust Sound Recognition
Yao, Yanli
Yu, Qiang
Wang, Longbiao
Dang, Jianwu
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[6] Multi-Person Pose Tracking With Sparse Key-Point Flow Estimation and Hierarchical Graph Distance Minimization
Jiang, Yalong
Ding, Wenrui
Li, Hongguang
Chi, Zheru
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 3590 - 3605
[7] Sparse Temporal Encoding of Visual Features for Robust Object Recognition by Spiking Neurons
Zheng, Yajing
Li, Shixin
Yan, Rui
Tang, Huajin
Tan, Kay Chen
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5823 - 5833
[8] Learning-Based Auditory Encoding for Robust Speech Recognition
Chiu, Yu-Hsiang Bosco
Raj, Bhiksha
Stern, Richard M.
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (03): : 900 - 914
[9] LEARNING-BASED AUDITORY ENCODING FOR ROBUST SPEECH RECOGNITION
Chiu, Yu-Hsiang Bosco
Raj, Bhiksha
Stern, Richard M.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4278 - 4281
[10] Robust Representation and Recognition of Facial Emotions Using Extreme Sparse Learning
Shojaeilangari, Seyedehsamaneh
Yau, Wei-Yun
Nandakumar, Karthik
Li, Jun
Teoh, Eam Khwang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (07) : 2140 - 2152

← 1 2 →