Bird Call Classification Using DNN-Based Acoustic Modelling

被引:3
|
作者
Rajan, Rajeev [1 ,2 ]
Johnson, Jisna [1 ,2 ]
Kareem, Noumida Abdul [1 ,2 ]
机构
[1] Coll Engn, Dept Elect & Commun Engn, Thiruvananthapuram, Kerala, India
[2] APJ Abdul Kalam Technol Univ, Thiruvananthapuram, Kerala, India
关键词
Hidden Markov model; Gaussian mixture model; Deep neural network; Convolutional neural network;
D O I
10.1007/s00034-021-01896-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Bird call recognition using deep neural network-hidden Markov model (DNN-HMM)-based transcription is proposed. The work is an attempt to adapt the human speech recognition framework for bird call classification through transcription approach. Initially, the phone transcriptions are generated using CMU-Sphinx, and lexicons are modified using group delay-based segmentation. Later, bird call transcription is implemented using hybrid DNN-HMM framework through DNN-based acoustic modelling. During the DNN-based acoustic modelling, mel-frequency cepstral coefficient features (MFCCs) are computed and experimented with monophone models, triphone models, followed by linear discriminative analysis and maximum likelihood linear transform. The transcribed phonemes are corrected using context-based rules in the final phase. The proposed approach is evaluated on a dataset that consists of ten species with 563 audio tracks. The hybrid DNN-HMM approach outperforms the convolutional neural network and long short-term memory framework with an accuracy of 94.46%.
引用
收藏
页码:2669 / 2680
页数:12
相关论文
共 50 条
  • [21] DNN-BASED WIRELESS POSITIONING IN AN OUTDOOR ENVIRONMENT
    Lee, Jin-Young
    Eom, Chahyeon
    Kwak, Youngsu
    Kang, Hong-Goo
    Lee, Chungyong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3799 - 3803
  • [22] DNN-based Intelligent Beamforming on a Programmable Metasurface
    Li S.
    Fu S.
    Xu F.
    Journal of Radars, 2021, 10 (02) : 259 - 266
  • [23] DNN based Acoustic Scene Classification using Score Fusion of MFCC and Inverse MFCC
    Paseddula, Chandrasekhar
    Gangashetty, Suryakanth V.
    2018 IEEE 13TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (IEEE ICIIS), 2018, : 31 - 34
  • [24] DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope
    Koguchi, Junya
    Takamichi, Shinnosuke
    Morise, Masanori
    Saruwatari, Hiroshi
    Sagayama, Shigeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12) : 2673 - 2681
  • [25] ON USING HETEROGENEOUS DATA FOR VEHICLE-BASED SPEECH RECOGNITION: A DNN-BASED APPROACH
    Feng, Xue
    Richardson, Brigitte
    Amman, Scott
    Glass, James
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4385 - 4389
  • [26] DNN-Based Unit Selection Using Frame-Sized Speech Segments
    Zhou, Zhi-Ping
    Ling, Zhen-Hua
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [27] DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus
    Yamashita, Yuki
    Koriyama, Tomoki
    Saito, Yuki
    Takamichi, Shinnosuke
    Ijima, Yusuke
    Masumura, Ryo
    Saruwatari, Hiroshi
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6438 - 6443
  • [28] DNN-based anomaly prediction for the uncertainty in visual SLAM
    Bosdelekidis, Vasileios
    Johansen, Tor A.
    Sokolova, Nadezda
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 684 - 691
  • [29] Model integration for HMM- and DNN-based speech synthesis using Product-of-Experts framework
    Tachibana, Kentaro
    Toda, Tomoki
    Shiga, Yoshinori
    Kawai, Hisashi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2288 - 2292
  • [30] Towards breaking DNN-based audio steganalysis with GAN
    Wang, Jie
    Wang, Rangding
    Dong, Li
    Yan, Diqun
    Zhang, Xueyuan
    Lin, Yuzhen
    INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2021, 14 (04) : 371 - 383