ACOUSTIC SCENE ANALYSIS BASED ON LATENT ACOUSTIC TOPIC AND EVENT ALLOCATION

被引：12

作者：

Imoto, Keisuke ^{[1
]}

Ohishi, Yasunori ^{[2
]}

Uematsu, Hisashi ^{[1
]}

Ohmuro, Hitoshi ^{[1
]}

机构：

[1] NTT Media Intelligence Labs, Audio Speech & Language Media Project, Kanagawa, Japan

[2] NTT Commun Sci Lab, Media Informat Lab, Kanagawa, Japan

来源：

2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2013年

关键词：

Acoustic event detection (AED); acoustic scene analysis; probabilistic generative model; CLASSIFICATION;

D O I：

10.1109/MLSP.2013.6661957

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We propose a model for analyzing acoustic scenes by using long-term (more than several seconds) acoustic signals based on a probabilistic generative model of an acoustic feature sequence associated with acoustic scenes (e.g. "cooking") and acoustic events (e.g. "cutting with a knife," "heating a skillet" or "running water") called latent acoustic topic and event allocation (LATEA) model. The proposed model allows the analysis of a wide variety of sounds and the capture of abstract acoustic scenes by representing acoustic events and scenes as latent variables, and can also describe the acoustic similarity and variance between acoustic events by representing acoustic features as a mixture of Gaussian components. Experiments with real-life sounds indicated that the proposed model exhibited lower perplexity than conventional models; it improved the stability of acoustic scene estimation. The experimental results also suggested that the proposed model can better describe the acoustic similarity and variance between acoustic events than conventional models.

引用

页数：6

共 18 条

[1]

Al Masum Shaikh Mostafa, 2008, 2008 11th International Conference on Computer and Information Technology (ICCIT), P294, DOI 10.1109/ICCITECHN.2008.4803018

[2]

[Anonymous], P EUR SIGN PROC C

[3]

[Anonymous], 2004, Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), DOI [10.1145/1014052, DOI 10.1145/1014052]

[4]

Attias H, 2000, ADV NEUR IN, V12, P209

[5] Learning to classify text using support vector machines: Methods, theory, and algorithms [J].

Basili, R .

COMPUTATIONAL LINGUISTICS, 2003, 29 (04) :655-661

[6] Latent Dirichlet allocation [J].

Blei, DM ;

Ng, AY ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022

[7] Audio-based context recognition [J].

Eronen, AJ ;

Peltonen, VT ;

Tuomi, JT ;

Klapuri, AP ;

Fagerlund, S ;

Sorsa, T ;

Lorho, G ;

Huopaniemi, J .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :321-329

[8] Finding scientific topics [J].

Griffiths, TL ;

Steyvers, M .

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5228-5235

[9]

Harma A., 2005, P IEEE INT C MULT EX

[10]

Imoto K., 2013, P INT

← 1 2 →