ZERO-SHOT AUDIO CLASSIFICATION BASED ON CLASS LABEL EMBEDDINGS

被引：0

作者：

Xie, Huang ^{[1
]}

Virtanen, Tuomas ^{[1
]}

机构：

[1] Tampere Univ, Tampere 33720, Finland

来源：

2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) | 2019年

基金：

欧洲研究理事会;

关键词：

zero-shot learning; audio classification; class label embedding;

D O I：

10.1109/waspaa.2019.8937283

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a zero-shot learning approach for audio classification based on the textual information about class labels without any audio samples from target classes. We propose an audio classification system built on the bilinear model, which takes audio feature embeddings and semantic class label embeddings as input, and measures the compatibility between an audio feature embedding and a class label embedding. We use VGGish to extract audio feature embeddings from audio recordings. We treat textual labels as semantic side information of audio classes, and use Word2Vec to generate class label embeddings. Results on the ESC-50 dataset show that the proposed system can perform zeroshot audio classification with small training dataset. It can achieve accuracy (26 % on average) better than random guess (10 %) on each audio category. Particularly, it reaches up to 39.7 % for the category of natural audio classes.

引用

页码：264 / 267

页数：4

共 11 条

[1]

Akata Z, 2015, PROC CVPR IEEE, P2927, DOI 10.1109/CVPR.2015.7298911

[2]

Fonseca E., 2017, P 18 ISMIR C, P486, DOI DOI 10.5281/ZENODO.1417159

[3]

Fu Y., 2017, CVPR

[4]

Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261

[5]

Hershey S, 2017, INT CONF ACOUST SPEE, P131, DOI 10.1109/ICASSP.2017.7952132

[6]

Mikolov T., 2013, ADV NEURAL INFORM PR, V26, P3111

[7]

Palatucci Mark, 2009, Advances in Neural Information Processing Systems, P1410

[8] ESC: Dataset for Environmental Sound Classification [J].

Piczak, Karol J. .

MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, :1015-1018

[9]

Romera-Paredes Bernardino, 2015, P INT C MACH LEARN L, V37, P2152

[10] Large scale image annotation: learning to rank with joint word-image embeddings [J].

Weston, Jason ;

Bengio, Samy ;

Usunier, Nicolas .

MACHINE LEARNING, 2010, 81 (01) :21-35

← 1 2 →