Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network

被引：0

作者：

Venkatesh, Spoorthy ^{[1
]}

Koolagudi, Shashidhar G. ^{[1
]}

机构：

[1] Natl Inst Technol Karnataka, Surathkal 575025, India

来源：

COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I | 2024年 / 2009卷

关键词：

Polyphonic Sound Event Detection (SED); Constant Q-Transform (CQT); Deep learning; Modified Recurrent Temporal Pyramid Network; CLASSIFICATION; SCENES;

D O I：

10.1007/978-3-031-58181-6_47

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named "Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)" is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems.

引用

页码：554 / 564

页数：11

共 50 条

[1] Polyphonic Sound Event Detection by Using Capsule Neural Networks
Vesperini, Fabio
Gabrielli, Leonardo
Principi, Emanuele
Squartini, Stefano
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 310 - 322
[2] RECURRENT NEURAL NETWORKS FOR POLYPHONIC SOUND EVENT DETECTION IN REAL LIFE RECORDINGS
Parascandolo, Giambattista
Huttunen, Heikki
Virtanen, Tuomas
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6440 - 6444
[3] Polyphonic Sound Event Detection Based on Residual Convolutional Recurrent Neural Network With Semi-Supervised Loss Function
Kim, Nam Kyun
Kim, Hong Kook
IEEE ACCESS, 2021, 9 (09): : 7564 - 7575
[4] A Comprehensive Review of Polyphonic Sound Event Detection
Chan, T. K.
Chin, Cheng Siong
IEEE ACCESS, 2020, 8 : 103339 - 103373
[5] MULTI-SCALE RECURRENT NEURAL NETWORK FOR SOUND EVENT DETECTION
Lu, Rui
Duan, Zhiyao
Zhang, Changshui
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 131 - 135
[6] Metrics for Polyphonic Sound Event Detection
Mesaros, Annamaria
Heittola, Toni
Virtanen, Tuomas
APPLIED SCIENCES-BASEL, 2016, 6 (06):
[7] Augmented Strategy For Polyphonic Sound Event Detection
Wang, Bolun
Fu, Zhong-Hua
Wu, Hao
2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1496 - 1500
[8] A survey of Deep Learning for Polyphonic Sound event detection
Dang, An
Vu, Toan H.
Wang, Jia-Ching
PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 75 - 78
[9] Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection
Upadhyay, Shreya G.
Su, Bo-Hao
Lee, Chi-Chun
INTERSPEECH 2020, 2020, : 3102 - 3106
[10] SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS
Li, Yanxiong
Liu, Mingle
Drossos, Konstantinos
Virtanen, Tuomas
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 286 - 290

← 1 2 3 4 5 →