Bird sounds classification by combining PNCC and robust Mel-log filter bank features

被引:4
作者
Badi, Alzahra [1 ]
Ko, Kyungdeuk [1 ]
Ko, Hanseok [1 ]
机构
[1] Korea Univ, Sch Elect Engn, Engn Bldg Room 419,Anam Campus 145 Anam Ro, Seoul 02841, South Korea
来源
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA | 2019年 / 38卷 / 01期
关键词
Acoustic event recognition; Environmental sound classification; CNN (Convolutional Neural Network); Weiner filter; PNCCs (Power Normalized Cepstral Coefficients);
D O I
10.7776/ASK.2019.38.1.039
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, combining features is proposed as a way to enhance the classification accuracy of sounds under noisy environments using the CNN (Convolutional Neural Network) structure. A robust log Mel-filter bank using Wiener filter and PNCCs (Power Normalized Cepstral Coefficients) are extracted to form a 2-dimensional feature that is used as input to the CNN structure. An ebird database is used to classify 43 types of bird species in their natural environment. To evaluate the performance of the combined features under noisy environments, the database is augmented with 3 types of noise under 4 different SNRs (Signal to Noise Ratios) (20 dB, 10 dB, 5 dB, 0 dB). The combined feature is compared to the log Mel-filter bank with and without incorporating the Wiener filter and the PNCCs. The combined feature is shown to outperform the other mentioned features under clean environments with a 1.34 % increase in overall average accuracy. Additionally, the accuracy under noisy environments at the 4 SNR levels is increased by 1.06 % and 0.65 % for shop and schoolyard noise backgrounds, respectively.
引用
收藏
页码:39 / 46
页数:8
相关论文
共 26 条
  • [1] Lexicon Free Arabic Speech Recognition Recipe
    Ahmed, Abdelrahman
    Hifny, Yasser
    Shaalan, Khaled
    Toral, Sergio
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 147 - 159
  • [2] Al-Kaltakchi MusabTS., 2016, 2016 IEEE Wireless Communications and Networking Conference, P1
  • [3] Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    [J]. DIGITAL SIGNAL PROCESSING, 2014, 29 : 147 - 157
  • [4] [Anonymous], ARXIV180704970
  • [5] [Anonymous], 2017, P DET CLASS AC SCEN
  • [6] [Anonymous], ITU T P 56 OBJ MEAS
  • [7] [Anonymous], FUNDAMENTALS STAT SI
  • [8] [Anonymous], SCI WORLD J
  • [9] Bandanau D, 2016, INT CONF ACOUST SPEE, P4945, DOI 10.1109/ICASSP.2016.7472618
  • [10] Acoustic classification of multiple simultaneous bird species: A multi-instance multi-label approach
    Briggs, Forrest
    Lakshminarayanan, Balaji
    Neal, Lawrence
    Fern, Xiaoli Z.
    Raich, Raviv
    Hadley, Sarah J. K.
    Hadley, Adam S.
    Betts, Matthew G.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (06) : 4640 - 4650