Marine Mammal Call Classification Using a Multi-Scale Two-Channel Fusion Network (MT-Resformer)

被引:0
作者
Li, Xiang [1 ,2 ]
Dong, Chao [1 ,2 ,3 ]
Dong, Guixin [4 ]
Cui, Xuerong [1 ]
Chen, Yankun [2 ,3 ]
Zhang, Peng [4 ]
Li, Zhanwei [4 ]
机构
[1] China Univ Petr East China, Coll Oceanog & Space Informat, Guzhenkou Campus, Qingdao 266000, Peoples R China
[2] Key Lab Marine Environm Survey Technol & Applicat, Guangzhou 510300, Peoples R China
[3] Minist Nat Resources, South China Sea Marine Survey Ctr, Guangzhou 510300, Peoples R China
[4] Chimelong Grp Co, Guangzhou 511430, Peoples R China
基金
中国国家自然科学基金;
关键词
marine mammal vocalization classification; audio feature fusion; multi-scale; dual-channel network; CONVOLUTIONAL NEURAL-NETWORK; WHISTLES; MODEL;
D O I
10.3390/jmse13050944
中图分类号
U6 [水路运输]; P75 [海洋工程];
学科分类号
0814 ; 081505 ; 0824 ; 082401 ;
摘要
The classification of high-frequency marine mammal vocalizations often faces challenges due to the limitations of acoustic features, which are sensitive to mid-to-low frequencies but offer low resolution in high-frequency ranges. Additionally, single-channel networks can restrict overall classification performance. To tackle these challenges, we introduce MT-Resformer, an innovative dual-channel model with a multi-scale framework designed for classifying marine mammal vocalizations. Our approach introduces a feature fusion strategy that combines the constant-Q spectrogram with Mel filter-based spectrogram features, effectively overcoming the low resolution of Mel spectrograms in high frequencies. The MT-Resformer model incorporates two key components: a multi-scale parallel residual network (MResNet) and a Transformer network channel. The model employs a multi-level neural perceptron (MLP) to dynamically regulate the weighting of the two channels, enabling flexible feature fusion. Experimental findings validate the proposed approach, yielding classification accuracies of 99.17% on the Watkins dataset and 95.22% on the ChangLong dataset. These results emphasize its outstanding performance.
引用
收藏
页数:26
相关论文
共 53 条
[1]  
Aksenovich T.V., 2020, P 2020 INT MULTICONF, P1
[2]   Improved speech emotion recognition with Mel frequency magnitude coefficient [J].
Ancilin, J. ;
Milton, A. .
APPLIED ACOUSTICS, 2021, 179
[3]   Optimal marine mammal welfare under human care: Current efforts and future directions [J].
Brando, Sabrina ;
Broom, Donald M. ;
Acasuso-Rivero, Cristina ;
Clark, Fay .
BEHAVIOURAL PROCESSES, 2018, 156 :16-36
[4]  
Caetano M, 2011, INT CONF ACOUST SPEE, P4244
[5]   Context-dependent and seasonal fluctuation in bottlenose dolphin (Tursiops truncatus) vocalizations [J].
Diaz Lopez, Bruno .
ANIMAL COGNITION, 2022, 25 (06) :1381-1392
[6]   Feature fusion strategy and improved GhostNet for accurate recognition of fish feeding behavior [J].
Du, Zhuangzhuang ;
Xu, Xianbao ;
Bai, Zhuangzhuang ;
Liu, Xiaohang ;
Hu, Yang ;
Li, Wanchao ;
Wang, Cong ;
Li, Daoliang .
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 214
[7]   Real-time identification of marine mammal calls based on convolutional neural networks [J].
Duan, Dexin ;
Lu, Lian-gang ;
Jiang, Ying ;
Liu, Zongwei ;
Yang, Chunmei ;
Guo, Jingsong ;
Wang, Xiaoyan .
APPLIED ACOUSTICS, 2022, 192
[8]   An effective gender recognition approach using voice data via deeper LSTM networks [J].
Ertam, Fatih .
APPLIED ACOUSTICS, 2019, 156 :351-358
[9]   Two-stage detection of north Atlantic right whale upcalls using local binary patterns and machine learning algorithms [J].
Esfahanian, Mandi ;
Erdol, Nurgun ;
Gerstein, Edmund ;
Zhuang, Hanqi .
APPLIED ACOUSTICS, 2017, 120 :158-166
[10]   Automatic detection and taxonomic identification of dolphin vocalisations using convolutional neural networks for passive acoustic monitoring [J].
Frainer, Guilherme ;
Dufourq, Emmanuel ;
Fearey, Jack ;
Dines, Sasha ;
Probert, Rachel ;
Elwen, Simon ;
Gridley, Tess .
ECOLOGICAL INFORMATICS, 2023, 78