Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network

被引：0

作者：

Venkatesh, Spoorthy ^{[1
]}

Koolagudi, Shashidhar G. ^{[1
]}

机构：

[1] Natl Inst Technol Karnataka, Surathkal 575025, India

来源：

COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I | 2024年 / 2009卷

关键词：

Polyphonic Sound Event Detection (SED); Constant Q-Transform (CQT); Deep learning; Modified Recurrent Temporal Pyramid Network; CLASSIFICATION; SCENES;

D O I：

10.1007/978-3-031-58181-6_47

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named "Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)" is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems.

引用

页码：554 / 564

页数：11

共 50 条

[31] Noise Masking Recurrent Neural Network for Respiratory Sound Classification
Kochetov, Kirill
Putin, Evgeny
Balashov, Maksim
Filchenkov, Andrey
Shalyto, Anatoly
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 208 - 217
[32] Deep Recurrent Neural Network based Monaural Speech Separation using Recurrent Temporal Restricted Boltzmann Machines
Samui, Suman
Chakrabarti, Indrajit
Ghosh, Soumya K.
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3622 - 3626
[33] Teacher-Student Framework for Polyphonic Semi-supervised Sound Event Detection: Survey and Empirical Analysis
Diffallah, Zhor
Ykhlef, Hadjer
Bouarfa, Hafida
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (05)
[34] Intelligent Financial Fraud Detection Using Artificial Bee Colony Optimization Based Recurrent Neural Network
Karthikeyan, T.
Govindarajan, M.
Vijayakumar, V.
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02) : 1483 - 1498
[35] A Spatial Pyramid Pooling Convolutional Neural Network for Smoky Vehicle Detection
Cao, Yichao
Lu, Chang
Lu, Xiaobo
Xia, Xue
2018 37TH CHINESE CONTROL CONFERENCE (CCC), 2018, : 9170 - 9175
[36] SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
Nguyen, Thi Ngoc Tho
Watcharasupat, Karn N.
Nguyen, Ngoc Khanh
Jones, Douglas L.
Gan, Woon-Seng
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1749 - 1762
[37] An Intrusion Detection System Using a Deep Neural Network With Gated Recurrent Units
Xu, Congyuan
Shen, Jizhong
Du, Xin
Zhang, Fan
IEEE ACCESS, 2018, 6 : 48697 - 48707
[38] Group intrusion detection in the Internet of Things using a hybrid recurrent neural network
Asma Belhadi
Youcef Djenouri
Djamel Djenouri
Gautam Srivastava
Jerry Chun-Wei Lin
Cluster Computing, 2023, 26 : 1147 - 1158
[39] Intrusion Detection Model for IoT Using Recurrent Kernel Convolutional Neural Network
C. U. Om Kumar
Suguna Marappan
Bhavadharini Murugeshan
P. Mercy Rajaselvi Beaulah
Wireless Personal Communications, 2023, 129 : 783 - 812
[40] Using recurrent neural network models for early detection of heart failure onset
Choi, Edward
Schuetz, Andy
Stewart, Walter F.
Sun, Jimeng
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (02) : 361 - 370

← 1 2 3 4 5 →