Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection

被引：3

作者：

Upadhyay, Shreya G. ^{[1
,2
]}

Su, Bo-Hao ^{[1
,2
]}

Lee, Chi-Chun ^{[1
,2
]}

机构：

[1] Natl Tsing Hua Univ, Dept Elect Engn, Hsinchu, Taiwan

[2] MOST Joint Res Ctr AI Technol & All Vista Healthc, Hsinchu, Taiwan

来源：

INTERSPEECH 2020 | 2020年

关键词：

sound event detection; convolution recurrent neural network; attention; automatic speech recognition; CLASSIFICATION;

D O I：

10.21437/Interspeech.2020-2585

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

A well-trained Acoustic Sound Event Detection system captures the patterns of the sound to accurately detect events of interest in an auditory scene, which enables applications across domains of multimedia, smart living, and even health monitoring. Due to the scarcity and the weak labelling nature of the sound event data, it is often challenging to train an accurate and robust acoustic event detection model directly, especially for those rare occurrences. In this paper, we proposed an architecture which takes the advantage of integrating ASR network representations as additional input when training a sound event detector. Here we used the convolutional bi-directional recurrent neural network (CBRNN), which includes both spectral and temporal attentions, as the SED classifier and further combined the ASR feature representations when performing the end-to-end CBRNN training. Our experiments on the TUT 2017 rare sound event detection dataset showed that with the inclusion of ASR features, the overall discriminative performance of the end-to-end sound event detection system has improved; the average performance of our proposed framework in terms of f-score and error rates are 97 % and 0.05 % respectively.

引用

页码：3102 / 3106

页数：5

共 33 条

[1] SOUND EVENT DETECTION USING SPATIAL FEATURES AND CONVOLUTIONAL RECURRENT NEURAL NETWORK
Adavanne, Sharath
Pertila, Pasi
Virtanen, Tuomas
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 771 - 775
[2] SOUND EVENT DETECTION VIA DILATED CONVOLUTIONAL RECURRENT NEURAL NETWORKS
Li, Yanxiong
Liu, Mingle
Drossos, Konstantinos
Virtanen, Tuomas
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 286 - 290
[3] Diffusion-Based Convolutional Recurrent Neural Network for Improving Sound Event Detection
Al Dabel, Maryam M.
PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 8, ICICT 2024, 2024, 1004 : 173 - 183
[4] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
Cakir, Emre
Parascandolo, Giambattista
Heittola, Toni
Huttunen, Heikki
Virtanen, Tuomas
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1291 - 1303
[5] Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
Adavanne, Sharath
Politis, Archontis
Nikunen, Joonas
Virtanen, Tuomas
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) : 34 - 48
[6] MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection
Zhang, Keming
Cai, Yuanwen
Ren, Yuan
Ye, Ruida
He, Liang
IEEE ACCESS, 2020, 8 : 147337 - 147348
[7] Sound Event Detection with Perturbed Residual Recurrent Neural Network
Yuan, Shuang
Yang, Lidong
Guo, Yong
ELECTRONICS, 2023, 12 (18)
[8] Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network
Venkatesh, Spoorthy
Koolagudi, Shashidhar G.
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT I, 2024, 2009 : 554 - 564
[9] MULTI-SCALE CONVOLUTIONAL RECURRENT NEURAL NETWORK WITH ENSEMBLE METHOD FOR WEAKLY LABELED SOUND EVENT DETECTION
Guo, Yingmei
Xu, Mingxing
Wu, Zhiyong
Wu, Jianming
Su, Bin
2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 110 - 114
[10] A GENERAL NETWORK ARCHITECTURE FOR SOUND EVENT LOCALIZATION AND DETECTION USING TRANSFER LEARNING AND RECURRENT NEURAL NETWORK
Nguyen, Thi Ngoc Tho
Nguyen, Ngoc Khanh
Phan, Huy
Pham, Lam
Ooi, Kenneth
Jones, Douglas L.
Gan, Woon-Seng
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 935 - 939

← 1 2 3 4 →