CAPTURING TEMPORAL DEPENDENCIES THROUGH FUTURE PREDICTION FOR CNN-BASED AUDIO CLASSIFIERS

被引：2

作者：

Song, Hongwei ^{[1
]}

Han, Jiqing ^{[1
]}

Deng, Shiwen ^{[2
]}

Du, Zhihao ^{[1
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

[2] Harbin Normal Univ, Sch Math Sci, Harbin, Peoples R China

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

基金：

中国国家自然科学基金;

关键词：

Audio classification; temporal dependency modeling; hierarchical contrastive predictive coding; CONVOLUTIONAL NEURAL-NETWORKS; CLASSIFICATION;

D O I：

10.1109/ICASSP39728.2021.9414018

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.

引用

页码：101 / 105

页数：5

共 24 条

[1] ADAMKO P, 2017, NIPS W, P1
[2] Ba J., 2016, ARXIV160706450, V1050, P21
[3] Boulanger-Lewandowski N., 2012, Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription
[4] Choi K., 2017, ARXIV170309179
[5] An evaluation of Convolutional Neural Networks for music classification using spectrograms
Costa, Yandre M. G.
Oliveira, Luiz S.
Silla, Carlos N., Jr.
[J]. APPLIED SOFT COMPUTING, 2017, 52 : 28 - 38
[6] Guzhov Andrey, 2020, ARXIV200407301
[7] Lai C.-I, 2019, ARXIV190401575
[8] Effect of Mechanical Activation on the Kinetics of Copper Leaching from Copper Sulfide (CuS)
Lee, Jaeryeong
Kim, Suyun
Kim, Byoungjin
Lee, Jae-chun
[J]. METALS, 2018, 8 (03)
[9] Multi-stream Network With Temporal Attention For Environmental Sound Classification
Li, Xinyu
Chebiyyam, Venkata
Kirchhoff, Katrin
[J]. INTERSPEECH 2019, 2019, : 3604 - 3608
[10] Loshchilov I, 2019, 7 INT C LEARN REPR I

← 1 2 3 →