Convolutional-Recurrent Neural Network with the Tensor Fusion Mechanism for Acoustic Scene Classification

被引:0
作者
Jiang, Pengxu [1 ]
Guo, Ruxue [1 ]
Liang, Ruiyu [2 ]
Xie, Yue [2 ]
Zou, Cairong [1 ]
机构
[1] Southeast Univ, Sch Informat Sci & Engn, Nanjing, Peoples R China
[2] Nanjing Inst Technol, Sch Commun Engn, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
Acoustic scene classification; Convolutional Neural Network; Long Short-Term Memory; Fusion attention layer;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Acoustic scene classification (ASC) is one of the key fields of artificial intelligence. Due to the short duration of scene audio features, the existing deep learning network cannot fully capture information in short-term audio. In this regard, a convolutional-recurrent neural network with the tensor fusion mechanism (CRN-FM) is proposed for ASC. Each audio is divided into fixed-length segments, and the spectral features are extracted from the segment audio as the input. Then, a convolutional neural network (CNN) is used to obtain time-frequency related information, and long short-term memory (LSTM) is used to obtain time-related details. When receiving the output of the high-level features by the two modules, the designed tensor fusion attention layer fuses different tensors according to the difference in information saturation. Finally, a SoftMax classifier is used to classify scenes. Experimental results on DCASE 2018 and 2019 ASC datasets demonstrate the effectiveness of the proposed approach.
引用
收藏
页码:1470 / 1474
页数:5
相关论文
共 14 条
[1]   A Review of Deep Learning Based Methods for Acoustic Scene Classification [J].
Abesser, Jakob .
APPLIED SCIENCES-BASEL, 2020, 10 (06)
[2]   Learning Hierarchy Aware Embedding From Raw Audio for Acoustic Scene Classification [J].
Abrol, Vinayak ;
Sharma, Pulkit .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :1964-1973
[3]  
Bilot Valentin, 2019, DCASE2019 CHALLENGE
[4]   Where am I? Scene recognition for mobile robots using audio features [J].
Chu, Selina ;
Narayanan, Shrikanth ;
Kuo, C. -C. Jay ;
Mataric, Maja J. .
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, :885-888
[5]  
Fang Y., 2021, IAENG International Journal of Computer Science, V48, P613
[6]  
Mesaros A., 2018, P DETECTION CLASSIFI
[7]  
Paseddula Chandrasekhar, 2019, DCASE2019 CHALLENGE
[8]  
Ren Z., 2020, IEEE Transactions on Multimedia
[9]  
Ren Z, 2019, INT CONF ACOUST SPEE, P56, DOI [10.1109/ICASSP.2019.8683434, 10.1109/icassp.2019.8683434]
[10]  
Ren Zhao, 2018, DCASE2018 CHALLENGE