Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引:0
|
作者
Lambamo, Wondimu [1 ]
Srinivasagan, Ramasamy [1 ,2 ]
Jifara, Worku [1 ]
机构
[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia
[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia
来源
PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023 | 2024年 / 2068卷
关键词
Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;
D O I
10.1007/978-3-031-57624-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.
引用
收藏
页码:154 / 175
页数:22
相关论文
共 50 条
  • [21] Model Free Identification of Traffic Conditions Using Unmanned Aerial Vehicles and Deep Learning
    Eleni I. Vlahogianni
    Javier Del Ser
    Konstantinos Kepaptsoglou
    Ibai Laña
    Journal of Big Data Analytics in Transportation, 2021, 3 (1): : 1 - 13
  • [22] Design of a Hybrid Bioinspired Deep Learning Model for Identification of Heart Diseases Using Clinical Parameters
    Kulkarni D.
    Soni R.
    SN Computer Science, 4 (5)
  • [23] Voiceprint Identification for Limited Dataset Using the Deep Migration Hybrid Model Based on Transfer Learning
    Sun, Cunwei
    Yang, Yuxin
    Wen, Chang
    Xie, Kai
    Wen, Fangqing
    SENSORS, 2018, 18 (07)
  • [24] Speaker Identification in Noisy Environment with Use of the Precise Model of the Human Auditory System
    Azetsu, Tadahiro
    Abuku, Masahiro
    Suetake, Noriaki
    Uchino, Eiji
    INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 92 - 95
  • [25] Emotional speaker identification using a novel capsule nets model
    Nassif, Ali Bou
    Shahin, Ismail
    Elnagar, Ashraf
    Velayudhan, Divya
    Alhudhaif, Adi
    Polat, Kemal
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
  • [26] Sheep Identification Using a Hybrid Deep Learning and Bayesian Optimization Approach
    Salama, Aya
    Hassanien, Aboul Ellah
    Fahmy, Aly
    IEEE ACCESS, 2019, 7 : 31681 - 31687
  • [27] Speaker verification using IMNMF and MFCC with feature warping under noisy environment
    Jiang, Changjiang
    Ba, Lifang
    Tang, Xianlun
    Wen, Dengfeng
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 2583 - 2588
  • [28] Real-Time Speaker Identification Using Speaker Model Distance
    Zeinali, Hossein
    Sameti, Hossein
    Hadian, Hossein
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 643 - 647
  • [29] A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients
    Salvati, Daniele
    Drioli, Carlo
    Foresti, Gian Luca
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 222
  • [30] Deep Learning-Based End-to-End Speaker Identification Using Time-Frequency Representation of Speech Signal
    Saritha, Banala
    Laskar, Mohammad Azharuddin
    Kirupakaran, Anish Monsley
    Laskar, Rabul Hussain
    Choudhury, Madhuchhanda
    Shome, Nirupam
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 43 (3) : 1839 - 1861