Frequency-based CNN and attention module for acoustic scene classification

被引：6

作者：

Aryal, Nisan ^{[1
]}

Lee, Sang-Woong ^{[2
]}

机构：

[1] Gachon Univ, Dept IT Convergence Engn, Seongnam 13120, South Korea

[2] Gachon Univ, Dept Software, Seongnam 13120, South Korea

来源：

APPLIED ACOUSTICS | 2023年 / 210卷

基金：

新加坡国家研究基金会;

关键词：

Acoustic scene classification; Attention; CBAM; SENet; Attention module; DCASE;

D O I：

10.1016/j.apacoust.2023.109411

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Acoustic scene classification (ASC) is an audio classification task that identifies the environment in which sounds are recorded. Audio-related machine learning algorithms suffer from the device mismatch prob-lem; that is, when trained from audio data recorded from one device, the algorithms cannot generalize to audio samples recorded using another device. In this study, a novel convolutional neural network, called a frequency-aware convolutional neural network (FACNN), is introduced to solve the device mismatch problem by focusing on the frequency information of the audio samples. Furthermore, an attention mod-ule, called the frequency attention network (FANet), is introduced to generate an attention map based on the frequency information of the input feature maps. FANet helps the FACNN to focus on the important frequency information, thus improving performance. The proposed method is trained on the TAU Urban Acoustic Scenes 2019 Mobile development dataset and TAU Urban Acoustic Scenes 2020 Mobile develop-ment dataset. The proposed method achieves a state-of-the-art accuracy of 75.99% in the TAU Urban Acoustic Scenes 2019 Mobile development dataset and a competitive result of 72.6% in the TAU Urban Acoustic Scenes 2020 Mobile development dataset. In addition, a comparison of FANet with the convolu-tional block attention module (CBAM) and the squeeze-and-excitation network (SENet) was performed. The results show that FANet can mitigate the device mismatch problem by improving the performance of the unseen devices.(c) 2023 Elsevier Ltd. All rights reserved.

引用

页数：12

共 59 条

[1] A Review of Deep Learning Based Methods for Acoustic Scene Classification [J].

Abesser, Jakob .

APPLIED SCIENCES-BASEL, 2020, 10 (06)

[2] A novel acoustic scene classification model using the late fusion of convolutional neural networks and different ensemble classifiers [J].

Alamir, Mahmoud A. .

APPLIED ACOUSTICS, 2021, 175

[3]

[Anonymous], 2008, US

[4]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]

[5]

Bai X, 2020, INT CONF ACOUST SPEE, P656, DOI [10.1109/icassp40776.2020.9053519, 10.1109/ICASSP40776.2020.9053519]

[6]

Changmin K, 2020, DCASE2020 CHALLENGE

[7] Where am I? Scene recognition for mobile robots using audio features [J].

Chu, Selina ;

Narayanan, Shrikanth ;

Kuo, C. -C. Jay ;

Mataric, Maja J. .

2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, :885-888

[8] Audio Surveillance: A Systematic Review [J].

Crocco, Marco ;

Cristani, Marco ;

Trucco, Andrea ;

Murino, Vittorio .

ACM COMPUTING SURVEYS, 2016, 48 (04)

[9]

DCASE, 2019, US

[10] Audio-based context recognition [J].

Eronen, AJ ;

Peltonen, VT ;

Tuomi, JT ;

Klapuri, AP ;

Fagerlund, S ;

Sorsa, T ;

Lorho, G ;

Huopaniemi, J .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :321-329

← 1 2 3 4 5 6 →