A GENERIC CLASSIFICATION SYSTEM FOR MULTI-CHANNEL AUDIO INDEXING: APPLICATION TO SPEECH AND MUSIC DETECTION

被引:0
作者
Benaroya, Elie-Laurent [1 ]
Peeters, Geoffroy [1 ]
机构
[1] STMS IRCAM CNRS UPMC, Sound Analysis Synth Team, F-75004 Paris, France
来源
2013 14TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES (WIAMIS) | 2013年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
There is a rise in the number 3D audio-visual productions and archives that creates a need for indexation of 3D contents. Event detection using audio modality is a difficult task. The standard way to do classification on 3D audio is to first down-mix to mono audio and classify on that. In this paper, we describe a generic classifier for multi-channel audio event detection and propose several information fusion strategies. Our system is evaluated on a speech and music detection task on the audio of 3D movies. We improve the classification performances on our database by 1.5% for speech detection, and 8% for music detection, compared to the standard downmixing method. We also provide a comparison of several information fusion methods in the experiments.
引用
收藏
页数:4
相关论文
共 11 条
[1]  
Burred Juan Jose, 2009, P LSAS GRAZ AUSTR, P3
[2]   A comparison of features for speech, music discrimination. [J].
Carey, MJ ;
Parris, ES ;
Lloyd-Thomas, H .
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, :149-152
[3]   The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research [J].
Downie, J. Stephen .
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2008, 29 (04) :247-255
[4]  
Heittola T., 2011, Machine Listening in Multisource Environments, P36
[5]   On combining classifiers [J].
Kittler, J ;
Hatef, M ;
Duin, RPW ;
Matas, J .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (03) :226-239
[6]  
MPEG-7, 2002, INFORM TECHNOLOGY MU
[7]  
Peeters G., 2007, P 10 INT C DIG AUD E, P205
[8]  
Platt JC, 2000, ADV NEUR IN, P61
[9]  
Scheirer E, 1997, INT CONF ACOUST SPEE, P1331, DOI 10.1109/ICASSP.1997.596192
[10]  
Snoek C. G. M., 2005, 13th Annual ACM International Conference on Multimedia, P399, DOI 10.1145/1101149.1101236