Bird sounds classification by combining PNCC and robust Mel-log filter bank features

被引：4

作者：

Badi, Alzahra ^{[1
]}

Ko, Kyungdeuk ^{[1
]}

Ko, Hanseok ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Engn Bldg Room 419,Anam Campus 145 Anam Ro, Seoul 02841, South Korea

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | 2019年 / 38卷 / 01期

关键词：

Acoustic event recognition; Environmental sound classification; CNN (Convolutional Neural Network); Weiner filter; PNCCs (Power Normalized Cepstral Coefficients);

D O I：

10.7776/ASK.2019.38.1.039

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.

引用

页码：39 / 46

页数：8

共 26 条

[1] Lexicon Free Arabic Speech Recognition Recipe
Ahmed, Abdelrahman
Hifny, Yasser
Shaalan, Khaled
Toral, Sergio
[J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 147 - 159
[2] Al-Kaltakchi MusabTS., 2016, 2016 IEEE Wireless Communications and Networking Conference, P1
[3] Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique
Alam, Md Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
[J]. DIGITAL SIGNAL PROCESSING, 2014, 29 : 147 - 157
[4] [Anonymous], ARXIV180704970
[5] [Anonymous], 2017, P DET CLASS AC SCEN
[6] [Anonymous], ITU T P 56 OBJ MEAS
[7] [Anonymous], FUNDAMENTALS STAT SI
[8] [Anonymous], SCI WORLD J
[9] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
[10] Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach
Briggs, Forrest
Lakshminarayanan, Balaji
Neal, Lawrence
Fern, Xiaoli Z.
Raich, Raviv
Hadley, Sarah J. K.
Hadley, Adam S.
Betts, Matthew G.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (06) : 4640 - 4650

← 1 2 3 →