Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network

被引:0
|
作者
Venkatesh, Spoorthy [1 ]
Koolagudi, Shashidhar G. [1 ]
机构
[1] Natl Inst Technol Karnataka, Surathkal 575025, India
来源
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I | 2024年 / 2009卷
关键词
Polyphonic Sound Event Detection (SED); Constant Q-Transform (CQT); Deep learning; Modified Recurrent Temporal Pyramid Network; CLASSIFICATION; SCENES;
D O I
10.1007/978-3-031-58181-6_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named "Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)" is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [1] Polyphonic Sound Event Detection by Using Capsule Neural Networks
    Vesperini, Fabio
    Gabrielli, Leonardo
    Principi, Emanuele
    Squartini, Stefano
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 310 - 322
  • [2] RECURRENT NEURAL NETWORKS FOR POLYPHONIC SOUND EVENT DETECTION IN REAL LIFE RECORDINGS
    Parascandolo, Giambattista
    Huttunen, Heikki
    Virtanen, Tuomas
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6440 - 6444
  • [3] Polyphonic Sound Event Detection Based on Residual Convolutional Recurrent Neural Network With Semi-Supervised Loss Function
    Kim, Nam Kyun
    Kim, Hong Kook
    IEEE ACCESS, 2021, 9 (09): : 7564 - 7575
  • [4] A Comprehensive Review of Polyphonic Sound Event Detection
    Chan, T. K.
    Chin, Cheng Siong
    IEEE ACCESS, 2020, 8 : 103339 - 103373
  • [5] MULTI-SCALE RECURRENT NEURAL NETWORK FOR SOUND EVENT DETECTION
    Lu, Rui
    Duan, Zhiyao
    Zhang, Changshui
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 131 - 135
  • [6] Metrics for Polyphonic Sound Event Detection
    Mesaros, Annamaria
    Heittola, Toni
    Virtanen, Tuomas
    APPLIED SCIENCES-BASEL, 2016, 6 (06):
  • [7] Augmented Strategy For Polyphonic Sound Event Detection
    Wang, Bolun
    Fu, Zhong-Hua
    Wu, Hao
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1496 - 1500
  • [8] A survey of Deep Learning for Polyphonic Sound event detection
    Dang, An
    Vu, Toan H.
    Wang, Jia-Ching
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 75 - 78
  • [9] Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection
    Upadhyay, Shreya G.
    Su, Bo-Hao
    Lee, Chi-Chun
    INTERSPEECH 2020, 2020, : 3102 - 3106
  • [10] SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS
    Li, Yanxiong
    Liu, Mingle
    Drossos, Konstantinos
    Virtanen, Tuomas
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 286 - 290