Acoustic event detection in meeting-room environments

被引:68
作者
Temko, Andrey [1 ]
Nadeu, Climent [1 ]
机构
[1] Univ Politecn, TALP Res Ctr, Barcelona 08034, Spain
关键词
Acoustic event detection; Temporal overlaps; Support vector machines; Confusion-based clustering; CLASSIFICATION;
D O I
10.1016/j.patrec.2009.06.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in the signals that are captured by one or several microphones. The AED problem has been recently proposed for meeting-room or class-room environments, where a specific set of meaningful sounds has been defined, and several evaluations have been carried out (within the international CLEAR evaluation campaigns). This paper reports some work in AED done by the authors in that framework, and particularly presents the extension to the difficult problem of detecting overlapped sounds. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. An attempt to deal with that problem at the level of models using our SVM-based AED system is reported in the paper. The proposed two-step system noticeably outperforms the baseline system for both an artificially generated database and a real seminar recording database. The databases and metrics developed for the CLEAR 2007 evaluations are also described. Finally, a real-time AED system implemented in the UPC's smart-room using several microphones is reported, along with a GUI-based demo that includes also the output of an acoustic source localization system. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:1281 / 1288
页数:8
相关论文
共 21 条
[1]  
[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
[2]  
[Anonymous], CLASS EV ACT REL EV
[3]   Overlapped speech detection for improved speaker diarization in multiparty meetings [J].
Boakye, Kofi ;
Trueba-Hornero, Beatriz ;
Vinyals, Oriol ;
Friedland, Gerald .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :4353-4356
[4]  
Breiman L., 1984, Classification and Regression Trees, V432, P151
[5]  
*CLEAR, 2006, CLASS EV ACT REL EV
[6]  
Hyvärinen A, 2001, INDEPENDENT COMPONENT ANALYSIS: PRINCIPLES AND PRACTICE, P71
[7]  
JIANFENG C, 2005, P IEEE INT S CIRC SY
[8]   Time and frequency filtering of filter-bank energies for robust HMM speech recognition [J].
Nadeu, C ;
Macho, D ;
Hernando, J .
SPEECH COMMUNICATION, 2001, 34 (1-2) :93-114
[9]  
*NIST, 2006, RT 07 RICH TRANSCR M
[10]  
Platt JC, 2000, ADV NEUR IN, V12, P547