Birdsong classification based on multi feature channel fusion

被引:0
作者
Zhihua Liu
Wenjie Chen
Aibin Chen
Guoxiong Zhou
Jizheng Yi
机构
[1] Central South University of Forestry and Technology,Institute of Artificial Intelligence Application, College of Computer and Information Engineering
来源
Multimedia Tools and Applications | 2022年 / 81卷
关键词
Convolutional neural network; Birdsong classification; Feature fusion; Time domain characteristics; Time-frequency domain features;
D O I
暂无
中图分类号
学科分类号
摘要
Aiming at the essential feature of the time-continuity of birdsong in nature, this paper proposed a birdsong classification model composed of two feature channels, which combines the features of time domain and time frequency domain. In order to make better use of the features, we used the improved average threshold method to denoise the original time-domain waveform features to reduce the influence of noise features. The most suitable feature extractor and the best fusion method of these two features are discussed. In this paper, the 3D convolutional neural network (3DCNN) and 2D convolutional neural network (2DCNN) were respectively applied as feature extractors of log_mel spectrum and waveform images. Then the advanced feature, which was extracted from these two feature channels, was fused in the middle stage, and the output enhanced feature was used as the input of double gated recurrent unit (d-GRU) network. In the work, birdsongs of four species from Xeno-Canto were selected for testing. The results showed that these three methods had improved the classification effect: feature fusion method in time domain and time-frequency domain, weighted average threshold noise reduction method and the method of extracting birdsong features via different types of feature extractors. The method of this paper had achieved mean average precision (MAP) of 95.9% in the classification comparison experiments, which was an inspiring outcome.
引用
收藏
页码:15469 / 15490
页数:21
相关论文
共 54 条
  • [1] Abdoli S(2019)End-to-end environmental sound classification using a 1D convolutional neural network Expert Syst Appl 136 252-263
  • [2] Cardinal P(2019)Cross-domain deep feature combination for bird species classification with audio-visual data IEICE Trans Inf Syst 102 2033-2042
  • [3] Koerich AL(2012)Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach The Journal of the Acoustical Society of America 131 4640-4650
  • [4] Bold N(2020)Identification of tomato leaf diseases based on combination of ABCK-BWTR and B-ARNet Comput Electron Agric 178 105730-413
  • [5] Zhang C(2007)Bird species recognition using support vector machines EURASIP Journal on Advances in Signal Processing 2007 406-101
  • [6] Akashi T(2020)Temporal attention based animal sound classification The Journal of the Acoustical Society of Korea 39 93-1687
  • [7] Briggs F(2006)Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis Pattern Recognition Letters 27 1672-9
  • [8] Lakshminarayanan B(2020)Time–frequency feature fusion for noise robust audio event classification Circ Syst Signal Process 39 1-37
  • [9] Neal L(2006)Wavelets in recognition of bird sounds EURASIP Journal on Advances in Signal Processing 2007 26-190
  • [10] Fern XZ(1980)Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy IEEE Trans Inf Theory 26 185-524