Audio Surveillance: A Systematic Review

被引:158
作者
Crocco, Marco [1 ]
Cristani, Marco [2 ]
Trucco, Andrea [3 ]
Murino, Vittorio [1 ]
机构
[1] Ist Italiano Tecnol, Pattern Anal & Comp Vis, Pisa, Italy
[2] Univ Verona, Dept Comp Sci, Ca Vignal 2,Str Le Grazie 15, I-37134 Verona, Italy
[3] Univ Genoa, Dipartimento Ingn Navale Elettr Elettr & Telecomu, Via Opera Pia 11, I-16145 Genoa, Italy
关键词
Algorithms; Security; Automated surveillance; audio surveillance; multimodal surveillance; PASSIVE SOURCE LOCALIZATION; EVENT DETECTION; ROBUST LOCALIZATION; PROBABILISTIC MODEL; OBJECT LOCALIZATION; BAND SIGNALS; CLASSIFICATION; SOUND; TIME; INFORMATION;
D O I
10.1145/2871183
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Despite surveillance systems becoming increasingly ubiquitous in our living environment, automated surveillance, currently based on video sensory modality and machine intelligence, lacks most of the time the robustness and reliability required in several real applications. To tackle this issue, audio sensory devices have been incorporated, both alone or in combination with video, giving birth in the past decade, to a considerable amount of research. In this article, audio-based automated surveillance methods are organized into a comprehensive survey: A general taxonomy, inspired by the more widespread video surveillance field, is proposed to systematically describe the methods covering background subtraction, event classification, object tracking, and situation analysis. For each of these tasks, all the significant works are reviewed, detailing their pros and cons and the context for which they have been proposed. Moreover, a specific section is devoted to audio features, discussing their expressiveness and their employment in the above-described tasks. Differing from other surveys on audio processing and analysis, the present one is specifically targeted to automated surveillance, highlighting the target applications of each described method and providing the reader with a systematic and schematic view useful for retrieving the most suited algorithms for each specific requirement.
引用
收藏
页数:46
相关论文
共 159 条
[11]   A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking [J].
Arulampalam, MS ;
Maskell, S ;
Gordon, N ;
Clapp, T .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (02) :174-188
[12]   Multimodal fusion for multimedia analysis: a survey [J].
Atrey, Pradeep K. ;
Hossain, M. Anwar ;
El Saddik, Abdulmotaleb ;
Kankanhalli, Mohan S. .
MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379
[13]   Information assimilation framework for event detection in multimedia surveillance systems [J].
Atrey, Pradeep Kumar ;
Kankanhalli, Mohan S. ;
Jain, Ramesh .
MULTIMEDIA SYSTEMS, 2006, 12 (03) :239-253
[14]   Data association in multi-target detection using the transferable belief model [J].
Ayoun, A ;
Smets, P .
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2001, 16 (10) :1167-1182
[15]  
Azlan M, 2005, ASSIST TECHNOL RES S, V15, P264
[16]  
Barni M, 2013, IEEE INT WORKS INFOR, P91, DOI 10.1109/WIFS.2013.6707800
[17]  
Barni M, 2013, INT CONF ACOUST SPEE, P8682, DOI 10.1109/ICASSP.2013.6639361
[18]   Onsets Coincidence for Cross-Modal Analysis [J].
Barzelay, Zohar ;
Schechner, Yoav Y. .
IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (02) :108-120
[19]   A graphical model for audiovisual object tracking [J].
Beal, MJ ;
Jojic, N ;
Attias, H .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2003, 25 (07) :828-836
[20]   Audio source separation with a single sensor [J].
Benaroya, L ;
Bimbot, F ;
Gribonval, R .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :191-199