A Graphical Representation and Dissimilarity Measure for Basic Everyday Sound Events

被引：7

作者：

Adiloglu, Kamil ^{[1
]}

Annies, Robert ^{[1
]}

Wahlen, Elio ^{[2
]}

Purwins, Hendrik ^{[3
]}

Obermayer, Klaus ^{[4
]}

机构：

[1] Tech Univ Berlin, D-10623 Berlin, Germany

[2] Hamburg Univ Appl Sci, D-20099 Hamburg, Germany

[3] Univ Pompeu Fabra, Mus Technol Grp, Dept Informat & Commun Technol, Barcelona 08018, Spain

[4] Tech Univ Berlin, Neural Informat Proc Grp, NI, D-10587 Berlin, Germany

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 05期

关键词：

Audio analysis and synthesis; audio coding; RECOGNITION; MODEL;

D O I：

10.1109/TASL.2012.2184752

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Studies of Gaver (W. W. Gaver, "How do we hear in the world? Explorations in ecological acoustics," Ecological Psychology, 1993) revealed that humans categorize everyday sounds considering the processes that have generated them: He defined these categories in a taxonomy according to the aggregate states of the involved materials (solid, liquid, gas) and the physical nature of the sound generating interaction such as deformation, friction, etc., for solids. We exemplified this taxonomy in an everyday sound database that contains recordings of basic isolated sound events of these categories. We used a sparse method to represent and to visualize these sound events. This representation relies on a sparse decomposition of sounds into atomic filter functions in the time-frequency domain. The filter functions maximally correlated with a given sound are selected automatically to perform the decomposition. The obtained sparse point pattern depicts the skeleton of the given sound. The visualization of these point patterns revealed that acoustically similar sounds have similar point patterns. To detect these similarities, we defined a novel dissimilarity function by considering these point patterns as 3-D point graphs and applied a graph matching algorithm, which assigns the points of one sound to the points of the other sound. This novel dissimilarity measure is used in combination with a kernel machine for the classification experiments, yielding an average accuracy of 95% in one versus one discrimination tasks.

引用

页码：1542 / 1552

页数：11

共 29 条

[1]

[Anonymous], EVERYDAY SOUND CLA 1

[2]

[Anonymous], ADV SPEECH HEAR LANG

[3]

[Anonymous], P IEEE 8 INT C COMP

[4]

[Anonymous], P INT C MUS INF RETR

[5]

[Anonymous], 2000, ISMIR

[6]

[Anonymous], 1998, READINGS COMPUTATION

[7]

[Anonymous], 1998, MATLAB TOOLBOX AUDIT

[8]

[Anonymous], J ACOUST SOC AM

[9]

[Anonymous], 1979, Order No. 8004002)

[10]

[Anonymous], EIFFICIENT IMPLEMENT

← 1 2 3 →