Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network

被引:0
|
作者
Venkatesh, Spoorthy [1 ]
Koolagudi, Shashidhar G. [1 ]
机构
[1] Natl Inst Technol Karnataka, Surathkal 575025, India
来源
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I | 2024年 / 2009卷
关键词
Polyphonic Sound Event Detection (SED); Constant Q-Transform (CQT); Deep learning; Modified Recurrent Temporal Pyramid Network; CLASSIFICATION; SCENES;
D O I
10.1007/978-3-031-58181-6_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named "Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)" is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [31] Noise Masking Recurrent Neural Network for Respiratory Sound Classification
    Kochetov, Kirill
    Putin, Evgeny
    Balashov, Maksim
    Filchenkov, Andrey
    Shalyto, Anatoly
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 208 - 217
  • [32] Deep Recurrent Neural Network based Monaural Speech Separation using Recurrent Temporal Restricted Boltzmann Machines
    Samui, Suman
    Chakrabarti, Indrajit
    Ghosh, Soumya K.
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3622 - 3626
  • [33] Teacher-Student Framework for Polyphonic Semi-supervised Sound Event Detection: Survey and Empirical Analysis
    Diffallah, Zhor
    Ykhlef, Hadjer
    Bouarfa, Hafida
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (05)
  • [34] Intelligent Financial Fraud Detection Using Artificial Bee Colony Optimization Based Recurrent Neural Network
    Karthikeyan, T.
    Govindarajan, M.
    Vijayakumar, V.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 37 (02) : 1483 - 1498
  • [35] A Spatial Pyramid Pooling Convolutional Neural Network for Smoky Vehicle Detection
    Cao, Yichao
    Lu, Chang
    Lu, Xiaobo
    Xia, Xue
    2018 37TH CHINESE CONTROL CONFERENCE (CCC), 2018, : 9170 - 9175
  • [36] SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
    Nguyen, Thi Ngoc Tho
    Watcharasupat, Karn N.
    Nguyen, Ngoc Khanh
    Jones, Douglas L.
    Gan, Woon-Seng
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1749 - 1762
  • [37] An Intrusion Detection System Using a Deep Neural Network With Gated Recurrent Units
    Xu, Congyuan
    Shen, Jizhong
    Du, Xin
    Zhang, Fan
    IEEE ACCESS, 2018, 6 : 48697 - 48707
  • [38] Group intrusion detection in the Internet of Things using a hybrid recurrent neural network
    Asma Belhadi
    Youcef Djenouri
    Djamel Djenouri
    Gautam Srivastava
    Jerry Chun-Wei Lin
    Cluster Computing, 2023, 26 : 1147 - 1158
  • [39] Intrusion Detection Model for IoT Using Recurrent Kernel Convolutional Neural Network
    C. U. Om Kumar
    Suguna Marappan
    Bhavadharini Murugeshan
    P. Mercy Rajaselvi Beaulah
    Wireless Personal Communications, 2023, 129 : 783 - 812
  • [40] Using recurrent neural network models for early detection of heart failure onset
    Choi, Edward
    Schuetz, Andy
    Stewart, Walter F.
    Sun, Jimeng
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2017, 24 (02) : 361 - 370