Very short time environmental sound classification based on spectrogram pattern matching

被引：64

作者：

Khunarsal, Peerapol ^{[1
]}

Lursinsap, Chidchanok ^{[1
]}

Raicharoen, Thanapant ^{[2
]}

机构：

[1] Chulalongkorn Univ, Dept Math, Adv Virtual & Intelligent Comp Ctr AVIC, Bangkok 10330, Thailand

[2] Minist Def, Off Permanent Secretary Def, Def Informat & Space Technol Dept, Bangkok 10200, Thailand

来源：

INFORMATION SCIENCES | 2013年 / 243卷

关键词：

Spectrogram Spectrogram pattern matching; Environmental sound recognition; k-Nearest neighbourneighbor (k-NN); AUDIO; RECOGNITION; VECTOR; MUSIC;

D O I：

10.1016/j.ins.2013.04.014

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Environmental sounds are unstructured and similar to noise. However, the recognition of environmental sounds can benefit crime investigations, warning systems for elderly persons, and security systems. A few past research projects were developed for classifying the environmental sounds. In this paper, we proposed an environmental sound classification algorithm using spectrogram pattern matching along with neural network and k-nearest neighbor (k-NN) classifiers. Unlike other techniques, our approach is based on the observation that local features are more important than global features. In addition, our technique can avoid the problem of filtering less informative and irrelevant frequencies in the classification step. Twenty types of sound from BBC and Sound Ideas databases, with each sound sample longer than 10 min, were tested with our algorithm. The spectrogram feature was compared with mel frequency delta cepstral coefficient (MFCC), linear prediction coefficient (LPC), and matching pursuit (MP) features. Two relevant factors concerning the accuracy of classification, window size and sampling rate, were also investigated to find the suitable value of each factor. We also investigated all combinations of these features. Using the k-NN classifier, the maximum accuracy of 94.98% occurred when the spectrogram, LPC, and MP features were combined. The experiments showed that the spectrogram feature with a feed-forward neural network can effectively classify 1-s audio clips with an accuracy of 85.66%. Furthermore, a longer duration can significantly increase the accuracy of classifying each sound clip to a certain limit without affecting cost. Using an audio clip duration of 6 s, the spectrogram feature with a feed-forward neural network provided the best classification accuracy, at least 90.57%. (C) 2013 Elsevier Inc. All rights reserved.

引用

页码：57 / 74

页数：18

共 27 条

[1] Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition [J].

Albesano, D ;

Gemello, R ;

Mana, F .

INFORMATION SCIENCES, 2000, 123 (1-2) :3-11

[2]

[Anonymous], 2012, Proceedings of Signal Information Processing Association Annual Summit and Conference (APSIPA ASC), Hollywood, CA

[3] Environmental sound recognition using MP-BASED features [J].

Chu, Selina ;

Narayanan, Shrikanth ;

Kuo, C. -C. Jay .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :1-+

[4] Environmental Sound Recognition With Time-Frequency Audio Features [J].

Chu, Selina ;

Narayanan, Shrikanth ;

Kuo, C. -C. Jay .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06) :1142-1158

[5] Audio-based context recognition [J].

Eronen, AJ ;

Peltonen, VT ;

Tuomi, JT ;

Klapuri, AP ;

Fagerlund, S ;

Sorsa, T ;

Lorho, G ;

Huopaniemi, J .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :321-329

[6]

Esmaili S, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS, P665

[7]

Feki I., 2011, Proceedings 2011 IEEE Symposium on Computational Intelligence for Multimedia, Signal and Vision Processing (CIMSIVP 2011), P33, DOI 10.1109/CIMSIVP.2011.5949248

[8] Time-Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals [J].

Ghoraani, Behnaz ;

Krishnan, Sridhar .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07) :2197-2209

[9]

Han BJ, 2009, IEEE INT CON MULTI, P542, DOI 10.1109/ICME.2009.5202553

[10]

Karbasi M., 2011, 2011 8th International Conference on Information, Communications Signal Processing, P1

← 1 2 3 →