ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis And GMM

被引：7

作者：

Sharma, Pulkit ^{[1
]}

Abrol, Vinayak ^{[2
]}

Thakur, Anshul ^{[1
]}

机构：

[1] IIT Mandi, Suran, Himachal Prades, India

[2] Idiap Res Inst, Martigny, Switzerland

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Archetypal analysis; deep matrix factorization; acoustic scene classification;

D O I：

10.21437/Interspeech.2018-1481

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose a deep learning framework which combines the generalizability of Gaussian mixture models (GMM) and discriminative power of deep matrix factorization to learn acoustic scene embedding (ASe) for the acoustic scene classification task. The proposed approach first builds a Gaussian mixture model-universal background model (GMM-UBM) using frame-wise spectral representations. This UBM is adapted to a waveform, and the likelihood for each spectral frame representation is stored as a feature matrix. This matrix is fed to a deep matrix factorization pipeline (with audio recording level max-pooling) to compute a sparse-convex discriminative representation. The proposed deep factorization model is based on archetypal analysis, a form of convex NMF, which has been shown to be well suited for audio analysis. Finally, the obtained representation is mapped to a class label using a dictionary based auto-encoder consisting of linear and symmetric encoder and decoder with an efficient learning algorithm. The encoder projects the ASe of a waveform to the label space, while the decoder ensures that the feature can be reconstructed, resulting in better generalization on the test data.

引用

页码：3299 / 3303

页数：5

共 26 条

[1]

Abrol V., 2017, DCASE2017 CHALLENGE

[2]

[Anonymous], 2014, P SIAM INT C DAT MIN

[3]

[Anonymous], 2013, 2013 IEEE WORKSHOP A, DOI DOI 10.1109/WASPAA.2013.6701857

[4]

[Anonymous], P INTERSPEECH

[5]

Bisot V., 2017, P DET CLASS AC SCEN, P22

[6] Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification [J].

Bisot, Victor ;

Serizel, Romain ;

Essid, Slim ;

Richard, Gael .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) :1216-1229

[7]

Bisot V, 2016, INT CONF ACOUST SPEE, P6445, DOI 10.1109/ICASSP.2016.7472918

[8]

Bisot V, 2015, EUR SIGNAL PR CONF, P719, DOI 10.1109/EUSIPCO.2015.7362477

[9] Audio-based context recognition [J].

Eronen, AJ ;

Peltonen, VT ;

Tuomi, JT ;

Klapuri, AP ;

Fagerlund, S ;

Sorsa, T ;

Lorho, G ;

Huopaniemi, J .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :321-329

[10]

Gemmeke JF, 2017, INT CONF ACOUST SPEE, P776, DOI 10.1109/ICASSP.2017.7952261

← 1 2 3 →