Bird Call Classification Using DNN-Based Acoustic Modelling

被引:4
作者
Rajan, Rajeev [1 ,2 ]
Johnson, Jisna [1 ,2 ]
Kareem, Noumida Abdul [1 ,2 ]
机构
[1] Coll Engn, Dept Elect & Commun Engn, Thiruvananthapuram, Kerala, India
[2] APJ Abdul Kalam Technol Univ, Thiruvananthapuram, Kerala, India
关键词
Hidden Markov model; Gaussian mixture model; Deep neural network; Convolutional neural network;
D O I
10.1007/s00034-021-01896-2
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Bird call recognition using deep neural network-hidden Markov model (DNN-HMM)-based transcription is proposed. The work is an attempt to adapt the human speech recognition framework for bird call classification through transcription approach. Initially, the phone transcriptions are generated using CMU-Sphinx, and lexicons are modified using group delay-based segmentation. Later, bird call transcription is implemented using hybrid DNN-HMM framework through DNN-based acoustic modelling. During the DNN-based acoustic modelling, mel-frequency cepstral coefficient features (MFCCs) are computed and experimented with monophone models, triphone models, followed by linear discriminative analysis and maximum likelihood linear transform. The transcribed phonemes are corrected using context-based rules in the final phase. The proposed approach is evaluated on a dataset that consists of ten species with 563 audio tracks. The hybrid DNN-HMM approach outperforms the convolutional neural network and long short-term memory framework with an accuracy of 94.46%.
引用
收藏
页码:2669 / 2680
页数:12
相关论文
共 50 条
[41]   SYNTHETIC DATA FOR DNN-BASED DOA ESTIMATION OF INDOOR SPEECH [J].
Gelderblom, Femke B. ;
Liu, Yi ;
Kvam, Johannes ;
Myrvoll, Tor Andre .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :4390-4394
[42]   DNN-Based Fractional Doppler Channel Estimation for OTFS Modulation [J].
Guo, Lin ;
Gu, Peng ;
Zou, Jun ;
Liu, Guangzu ;
Shu, Feng .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (11) :15062-15067
[43]   DNN-BASED AR-WIENER FILTERING FOR SPEECH ENHANCEMENT [J].
Yang, Yan ;
Bao, Changchun .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :2901-2905
[44]   A MAXIMUM LIKELIHOOD APPROACH TO MULTI-OBJECTIVE LEARNING USING GENERALIZED GAUSSIAN DISTRIBUTIONS FOR DNN-BASED SPEECH ENHANCEMENT [J].
Niu, Shu-Tong ;
Du, Jun ;
Chai, Li ;
Lee, Chin-Hui .
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, :6229-6233
[45]   Implementation of DNN-Based Physical-Layer Network Coding [J].
Wang, Xuesong ;
Lu, Lu .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (06) :7380-7394
[46]   A Survey on Low-Latency DNN-Based Speech Enhancement [J].
Drgas, Szymon .
SENSORS, 2023, 23 (03)
[47]   Generating complementary acoustic model spaces in DNN-based sequence-to-frame DTW scheme for out-of-vocabulary spoken term detection [J].
Lee, Shi-wook ;
Tanaka, Kazuyo ;
Itoh, Yoshiaki .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :755-759
[48]   DronePaint: Swarm Light Painting with DNN-based Gesture Recognition [J].
Serpiva, Valerii ;
Karmanova, Ekaterina ;
Fedoseev, Aleksey ;
Perminov, Stepan ;
Tsetserukou, Dzmitry .
SIGGRAPH '21: ACM SIGGRAPH 2021 EMERGING TECHNOLOGIES, 2021,
[49]   Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement [J].
Xu, Ziyi ;
Strake, Maximilian ;
Fingscheidt, Tim .
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[50]   DNN-Based Assistant in Laparoscopic Computer-Aided Palpation [J].
Fukuda, Tomohiro ;
Tanaka, Yoshihiro ;
Fujiwara, Michitaka ;
Sano, Akihito .
FRONTIERS IN ROBOTICS AND AI, 2018, 5