CAPTURING TEMPORAL DEPENDENCIES THROUGH FUTURE PREDICTION FOR CNN-BASED AUDIO CLASSIFIERS

被引:2
作者
Song, Hongwei [1 ]
Han, Jiqing [1 ]
Deng, Shiwen [2 ]
Du, Zhihao [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Harbin Normal Univ, Sch Math Sci, Harbin, Peoples R China
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
基金
中国国家自然科学基金;
关键词
Audio classification; temporal dependency modeling; hierarchical contrastive predictive coding; CONVOLUTIONAL NEURAL-NETWORKS; CLASSIFICATION;
D O I
10.1109/ICASSP39728.2021.9414018
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper focuses on the problem of temporal dependency modeling in the CNN-based models for audio classification tasks. To capture audio temporal dependencies using CNNs, we take a different approach from the purely architecture-induced method and explicitly encode temporal dependencies into the CNN-based audio classifiers. More specifically, in addition to the classification objective, we require the CNN model to solve an auxiliary task of predicting the future features, which is formulated by leveraging the Contrastive Predictive Coding (CPC) loss. Furthermore, a novel hierarchical CPC (HCPC) model is proposed for capturing multi-level temporal dependencies at the same time. The proposed model is evaluated on a wide range of non-speech audio signals, including musical and in-the-wild environmental audio signals. We show that the proposed approach improves the backbone CNNs consistently on all tested benchmark datasets and outperforms a DenseNet model trained from scratch.
引用
收藏
页码:101 / 105
页数:5
相关论文
共 24 条
  • [1] ADAMKO P, 2017, NIPS W, P1
  • [2] Ba J., 2016, ARXIV160706450, V1050, P21
  • [3] Boulanger-Lewandowski N., 2012, Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription
  • [4] Choi K., 2017, ARXIV170309179
  • [5] An evaluation of Convolutional Neural Networks for music classification using spectrograms
    Costa, Yandre M. G.
    Oliveira, Luiz S.
    Silla, Carlos N., Jr.
    [J]. APPLIED SOFT COMPUTING, 2017, 52 : 28 - 38
  • [6] Guzhov Andrey, 2020, ARXIV200407301
  • [7] Lai C.-I, 2019, ARXIV190401575
  • [8] Effect of Mechanical Activation on the Kinetics of Copper Leaching from Copper Sulfide (CuS)
    Lee, Jaeryeong
    Kim, Suyun
    Kim, Byoungjin
    Lee, Jae-chun
    [J]. METALS, 2018, 8 (03)
  • [9] Multi-stream Network With Temporal Attention For Environmental Sound Classification
    Li, Xinyu
    Chebiyyam, Venkata
    Kirchhoff, Katrin
    [J]. INTERSPEECH 2019, 2019, : 3604 - 3608
  • [10] Loshchilov I, 2019, 7 INT C LEARN REPR I