Convolutional Neural Networks for Scops Owl Sound Classification

被引:26
作者
Hidayat, Alam Ahmad [1 ]
Cenggoro, Tjeng Wawan [1 ,2 ]
Pardamean, Bens [1 ,3 ]
机构
[1] Bina Nusantara Univ, Bioinformat & Data Sci Res Ctr, Jakarta 11480, Indonesia
[2] Bina Nusantara Univ, Sch Comp Sci, Comp Sci Dept, Jakarta 11480, Indonesia
[3] Bina Nusantara Univ, Comp Sci Dept, BINUS Grad Program Master Comp Sci, Jakarta 11480, Indonesia
来源
5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE 2020 | 2021年 / 179卷
关键词
acoustic features; bird sound classification; convolutional neural network; mean average precision; scops owl;
D O I
10.1016/j.procs.2021.12.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adopting a deep learning model into bird sound classification tasks becomes a common practice in order to construct a robust automated bird sound detection system. In this paper, we employ a four-layer Convolutional Neural Network (CNN) formulated to classify different species of Indonesia scops owls based on their vocal sounds. Two widely used representations of an acoustic signal: log-scaled mel-spectrogram and Mel Frequency Cepstral Coefficient (MFCC) are extracted from each sound file and fed into the network separately to compare the model performance with different inputs. A more complex CNN that can simultaneously process the two extracted acoustic representations is proposed to provide a direct comparison with the baseline model. The dual-input network is the well-performing model in our experiment that achieves 97.55% Mean Average Precision (MAP). Meanwhile, the baseline model achieves a MAP score of 94.36% for the mel-spectrogram input and 96.08% for the MFCC input. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:81 / 87
页数:7
相关论文
共 21 条
[1]   End-to-end environmental sound classification using a 1D convolutional neural network [J].
Abdoli, Sajjad ;
Cardinal, Patrick ;
Koerich, Alessandro Lameiras .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 136 :252-263
[2]   Automated classification of bird and amphibian calls using machine learning: A comparison of methods [J].
Acevedo, Miguel A. ;
Corrada-Bravo, Carlos J. ;
Corrada-Bravo, Hector ;
Villanueva-Rivera, Luis J. ;
Aide, T. Mitchell .
ECOLOGICAL INFORMATICS, 2009, 4 (04) :206-214
[3]  
Adavanne S, 2017, EUR SIGNAL PR CONF, P1729, DOI 10.23919/EUSIPCO.2017.8081505
[4]  
[Anonymous], 2008, Owls: A guide to the owls of the world
[5]   Automated sound recording and analysis techniques for bird surveys and conservation [J].
Brandes, T. Scott .
BIRD CONSERVATION INTERNATIONAL, 2008, 18 :S163-S173
[6]  
Cakir E, 2017, EUR SIGNAL PR CONF, P1744, DOI 10.23919/EUSIPCO.2017.8081508
[7]  
Goeau H, 2018, CLEF C LABS EV FOR S
[8]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[9]   Analysis of Acoustic Features in Gender Identification Model for English and Bahasa Indonesia Telephone Speeches [J].
Kacamarga, Muhamad Fitra ;
Cenggoro, Tjeng Wawan ;
Budiarto, Arif ;
Rahutomo, Reza ;
Pardamean, Bens .
4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY, 2019, 157 :199-204
[10]  
Kahl S., 2017, CLEF (working notes) 1866