Handcrafted features and late fusion with deep learning for bird sound classification

被引:69
作者
Xie, Jie [1 ,3 ]
Zhu, Mingying [2 ]
机构
[1] Jiangnan Univ, Sch Internet Things Engn, Minist Educ, Key Lab Adv Proc Control Light Ind, Wuxi 214122, Jiangsu, Peoples R China
[2] Univ Ottawa, Dept Econ, Ottawa, ON K1N 6N5, Canada
[3] Jiangnan Univ, Jiangsu Key Lab Adv Food Mfg Equipment & Technol, Wuxi, Jiangsu, Peoples R China
关键词
Bird sound classification; Convolutional neural networks; Acoustic feature; Visual feature; ACOUSTIC CLASSIFICATION;
D O I
10.1016/j.ecoinf.2019.05.007
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Automated classification of calling bird species is useful for large-scale temporal and spatial environmental monitoring. In this paper, we investigate acoustic features, visual features, and deep learning for bird sound classification. For the deep learning approach, the Convolutional Neural Network layers are used for learning generalized features and dimension reduction, while a conventional fully connected layer is used for classification. Then, an unified end-to-end model is built by combing those three layers for classifying calling bird species. For visual and acoustic features, two traditional classifiers are compared to classify the bird sounds. Experimental results on 14 bird species indicate that our proposed deep learning method can achieve the best F1-score 94.36%, which is higher than using the acoustic features approach (88.97%) and using the visual features approach (88.87%). To further improve the classification performance, a class-based late fusion method is explored. Our final best classification F1-score is 95.95%, which is obtained by the late fusion of the acoustic features approach, the visual features approach, and deep learning.
引用
收藏
页码:74 / 81
页数:8
相关论文
共 31 条
[1]   Automated classification of bird and amphibian calls using machine learning: A comparison of methods [J].
Acevedo, Miguel A. ;
Corrada-Bravo, Carlos J. ;
Corrada-Bravo, Hector ;
Villanueva-Rivera, Luis J. ;
Aide, T. Mitchell .
ECOLOGICAL INFORMATICS, 2009, 4 (04) :206-214
[2]   Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information [J].
Bao, L ;
Cui, Y .
BIOINFORMATICS, 2005, 21 (10) :2185-2190
[3]   Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring [J].
Bardeli, R. ;
Wolff, D. ;
Kurth, F. ;
Koch, M. ;
Tauchert, K. -H. ;
Frommolt, K. -H. .
PATTERN RECOGNITION LETTERS, 2010, 31 (12) :1524-1534
[4]   Automated sound recording and analysis techniques for bird surveys and conservation [J].
Brandes, T. Scott .
BIRD CONSERVATION INTERNATIONAL, 2008, 18 :S163-S173
[5]   Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach [J].
Briggs, Forrest ;
Lakshminarayanan, Balaji ;
Neal, Lawrence ;
Fern, Xiaoli Z. ;
Raich, Raviv ;
Hadley, Sarah J. K. ;
Hadley, Adam S. ;
Betts, Matthew G. .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (06) :4640-4650
[6]   CALCULATION OF A CONSTANT-Q SPECTRAL TRANSFORM [J].
BROWN, JC .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1991, 89 (01) :425-434
[7]   Music genre classification using LBP textural features [J].
Costa, Y. M. G. ;
Oliveira, L. S. ;
Koerich, A. L. ;
Gouyon, F. ;
Martins, J. G. .
SIGNAL PROCESSING, 2012, 92 (11) :2723-2737
[8]  
Farina A., 2017, Ecoacoustics: the Ecological Role of Sounds, DOI 10.1002/9781119230724
[9]   Wild bird indicators: using composite population trends of birds as measures of environmental health [J].
Gregory, Richard D. ;
van Strien, Arco .
ORNITHOLOGICAL SCIENCE, 2010, 9 (01) :3-22
[10]   Acoustic classification of Australian anurans based on hybrid spectral-entropy approach [J].
Han, Ng Chee ;
Muniandy, Sithi V. ;
Dayou, Jedol .
APPLIED ACOUSTICS, 2011, 72 (09) :639-645