Scalable identification of mixed environmental sounds, recorded from heterogeneous sources

被引:22
作者
Beltran, Jessica [1 ]
Chavez, Edgar [1 ]
Favela, Jesus [1 ]
机构
[1] CICESE, Ensenada 22860, Baja California, Mexico
关键词
Sound event classification; Audio fingerprint; Overlapped sounds; EVENT RECOGNITION; FEATURES;
D O I
10.1016/j.patrec.2015.08.027
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sound events can be used to establish context to assist a user to perform context dependent tasks. The state of the art methods allow the identification of isolated sound events, even with background noise when it can be modeled, however for mixed sound recognition the challenge still stands. The problem consist in identifying all the sounds occurring in a stream. In this paper we propose an audio representation suitable for mixed sounds identification without background/foreground modeling. Our approach is also lightweight, both in computational and space complexity and the final representation does not depend On the length of the input sound. We extract spectral, band-split, frame level features and their first and second derivatives in each band. The final representation is a set of histograms, one for each band. We proved experimentally that this representation is robust and allows the identification of overlapped sound events. We compared our approach against a representation based On the Mel Frequency Cepstral Coefficients and Non Negative Matrix Factorization for blind source separation using a single microphone, this was the only approach comparable to ours. For testing we conducted two different set of experiments. In the first one we collected poor quality audio recordings using a low-end smartphone for training. Without further enhancing or processing we were able to identify the components of classes of sound mixtures, even with sounds downloaded from the Internet where we had no control on the recording conditions or the foreground noise. In the second set of experiments we recorded 15 challenging sound classes of similar spectrurn, from an application scenario and identified them in a continuous recording with three types of background noise. Our results outperform the state of the art in speed, precision and recall. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:153 / 160
页数:8
相关论文
共 30 条
[1]   A new approach for blind separation of convolutive mixtures [J].
Acharyya, Ranjan ;
Ham, Fredric M. .
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, :2075-2080
[2]  
Ruiz-Martinez CA, 2013, I S INTELL SIG PROC, P210, DOI 10.1109/ISPACS.2013.6704548
[3]  
[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
[4]  
[Anonymous], 2006, ACM Trans. Speech Lang. Process, DOI [DOI 10.1145/1149290.1149292, 10.1145/1149290.1149292]
[5]  
Beltran-Marquez Jessica, 2012, Pattern Recognition. Proceedings 4th Mexican Conference (MCPR 2012), P334, DOI 10.1007/978-3-642-31149-9_34
[6]  
Bronkhorst AW, 2000, ACUSTICA, V86, P117
[7]  
Camarena-Ibarrola A, 2009, LECT NOTES COMPUT SC, V5856, P587, DOI 10.1007/978-3-642-10268-4_69
[8]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[9]   Environmental Sound Recognition With Time-Frequency Audio Features [J].
Chu, Selina ;
Narayanan, Shrikanth ;
Kuo, C. -C. Jay .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06) :1142-1158
[10]   Overlapping sound event recognition using local spectrogram features and the generalised hough transform [J].
Dennis, J. ;
Tran, H. D. ;
Chng, E. S. .
PATTERN RECOGNITION LETTERS, 2013, 34 (09) :1085-1093