Birdsong classification based on multi-feature fusion

被引:23
作者
Yan, Na [1 ]
Chen, Aibin [1 ,3 ]
Zhou, Guoxiong [1 ]
Zhang, Zhiqiang [2 ]
Liu, Xiangyong [4 ]
Wang, Jianwu [5 ]
Liu, Zhihua [1 ]
Chen, Wenjie [1 ]
机构
[1] Cent South Univ Forestry & Technol, Coll Comp & Informat Engn, Inst Artificial Intelligence Applicat, Changsha, Peoples R China
[2] Cent South Univ Forestry & Technol, Coll Forestry, Wildlife Conservat & Utilizat Lab, Changsha, Peoples R China
[3] Cent South Univ Forestry & Technol, Coll Life Sci & Technol, Hunan Prov Key Lab Urban Forest Ecol, Changsha, Peoples R China
[4] Hunan Zixing Artificial Intelligence Res Acad, Hunan Zixing, Peoples R China
[5] HuangFengQiao State Owned Forest Farm, Youxian Cty, Hunan, Peoples R China
关键词
Birdsong classification; Acoustic feature; Feature fusion; 3DCNN-LSTM; NEURAL-NETWORKS; RECOGNITION; SOUNDS; MFCC;
D O I
10.1007/s11042-021-11396-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classification of birdsong has very important signification to monitor the bird population in the habitats. Aiming at the birdsong dataset with complex and diverse audio background, this paper attempts to introduce an acoustic feature for voice and music analysis: Chroma. It is spliced and fused with the commonly used birdsong features, Log-Mel Spectrogram (LM) and Mel Frequency Cepstrum Coefficient (MFCC), to enrich the representational capacity of single feature; At the same time, in view of the characteristic that birdsong has continuous and dynamic changes in time, a 3DCNN-LSTM combined model is proposed as a classifier to make the network more sensitive to the birdsong information that changes with time. In this paper, we selected four bird audio data from the Xeno-Canto website to evaluate how LM, MFCC and Chroma were fused to maximize the birdsong audio information. The experimental results show that the LM-MFCC-C feature combination achieves the best result of 97.9% mean average precision (mAP) in the experiment.
引用
收藏
页码:36529 / 36547
页数:19
相关论文
共 55 条
[1]  
Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/2951913.2976746, 10.1145/3022670.2976746]
[2]  
[Anonymous], 2012, COURSERA NEURAL NETW
[3]  
[Anonymous], 2018, INT J MANAGE TECH EN
[4]   Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring [J].
Bardeli, R. ;
Wolff, D. ;
Kurth, F. ;
Koch, M. ;
Tauchert, K. -H. ;
Frommolt, K. -H. .
PATTERN RECOGNITION LETTERS, 2010, 31 (12) :1524-1534
[5]   Classifying environmental sounds using image recognition networks [J].
Boddapati, Venkatesh ;
Petef, Andrej ;
Rasmusson, Jim ;
Lundberg, Lars .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 :2048-2056
[6]   Environmental sound recognition: a survey [J].
Chachada, Sachin ;
Kuo, C. -C. Jay .
APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2014, 3
[7]  
Ellis D., 2007, Resour. Lab. Recognit. Org. Speech Audio-LabROSA
[8]   Bird species recognition using support vector machines [J].
Fagerlund, Seppo .
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2007, 2007 (1)
[9]  
Ganchev T., 2005, P 10 INT C SPEECH CO, V1, P191
[10]  
Ghosal D, 2018, INTERSPEECH, P2087