Analysis and classification of acoustic scenes with wavelet transform-based mel-scaled features

被引:12
作者
Waldekar, Shefali [1 ]
Saha, Goutam [1 ]
机构
[1] IIT Kharagpur, Dept Elect & Elect Commun Engn, Kharagpur, W Bengal, India
关键词
DCASE; Environmental sounds; Haar function; MFCC; SVM; OF-FRAMES APPROACH; URBAN SOUNDSCAPES; SUFFICIENT MODEL; RECOGNITION;
D O I
10.1007/s11042-019-08279-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Analysis of audio from real-life environments and their categorization into different acoustic scenes can make context-aware devices and applications more efficient. Unlike speech, such signals have overlapping frequency content while spanning a much larger audible frequency range. Also, they are less structured than speech/music signals. Wavelet transform has good time-frequency localization ability owing to its variable-length basis functions. Consequently, it facilitates the extraction of more characteristic information from environmental audio. This paper attempts to classify acoustic scenes by a novel use of wavelet-based mel-scaled features. The design of the proposed framework is based on the experiments conducted on two datasets which have same scene classes but differ with regard to sample length and amount of data (in hours). It outperformed two benchmark systems, one based on mel-frequency cepstral coefficients and Gaussian mixture models and the other based on log mel-band energies and multi-layer perceptron. We also present an investigation on the use of different train and test sample duration for acoustic scene classification.
引用
收藏
页码:7911 / 7926
页数:16
相关论文
共 38 条
[1]  
[Anonymous], IEEE AASP CHALL DET
[2]  
[Anonymous], 1992, Society for Industrial and Applied Mathematics
[3]  
[Anonymous], 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
[4]  
[Anonymous], 2016, IEEE AASP CHALL DET
[5]  
[Anonymous], 2001, P C ACOUST MUSIC THE
[6]   The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music [J].
Aucouturier, Jean-Julien ;
Defreville, Boris ;
Pachet, Francois .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 122 (02) :881-891
[7]   Acoustic Scene Classification [J].
Barchiesi, Daniele ;
Giannoulis, Dimitrios ;
Stowell, Dan ;
Plumbley, Mark D. .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (03) :16-34
[8]   Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification [J].
Bisot, Victor ;
Serizel, Romain ;
Essid, Slim ;
Richard, Gael .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) :1216-1229
[9]   COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].
BROWN, GJ ;
COOKE, M .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336
[10]  
Brummer N., 2007, Focal multi-class: Toolkit for evaluation, fusion and calibration of multi-class recognition scorestutorial and user manual