Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引:0
作者
Lambamo, Wondimu [1 ]
Srinivasagan, Ramasamy [1 ,2 ]
Jifara, Worku [1 ]
机构
[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia
[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia
来源
PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023 | 2024年 / 2068卷
关键词
Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;
D O I
10.1007/978-3-031-57624-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.
引用
收藏
页码:154 / 175
页数:22
相关论文
共 50 条
  • [31] Speaker Identification Using Semi-supervised Learning
    Fazakis, Nikos
    Karlos, Stamatis
    Kotsiantis, Sotiris
    Sgarbas, Kyriakos
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 389 - 396
  • [32] Skin Cancer Diagnosis using Deep Learning, Transfer Learning and Hybrid Model
    Prakash, Ravi
    Pandey, Trilok Nath
    Dash, Bibhuti Bhusan
    Patra, Sudhansu Shekhar
    De, Utpal Chandra
    Tripathy, Abinash
    2024 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS, ICICI 2024, 2024, : 90 - 95
  • [33] Speaker identification using spectrogram and learning vector quantization
    Li, Penghua
    Zhang, Shunxing
    Feng, Huizong
    Li, Yuanyuan
    Journal of Computational Information Systems, 2015, 11 (09): : 3087 - 3095
  • [34] End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks
    Salvati, Daniele
    Drioli, Carlo
    Foresti, Gian Luca
    INTERSPEECH 2019, 2019, : 4335 - 4339
  • [35] Speaker identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach
    Chen, CCT
    Chen, CT
    Hou, CK
    PATTERN RECOGNITION, 2004, 37 (05) : 1073 - 1075
  • [36] Text-Independent Speaker Identification Using the Histogram Transform Model
    Ma, Zhanyu
    Yu, Hong
    Tan, Zheng-Hua
    Guo, Jun
    IEEE ACCESS, 2016, 4 : 9733 - 9739
  • [37] Robust Far-Field Speaker Identification under Mismatched Conditions
    Jin, Qin
    Schultz, Tanja
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1893 - 1896
  • [38] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
    Nirupam Shome
    Banala Saritha
    Richik Kashyap
    Rabul Hussain Laskar
    Neural Computing and Applications, 2023, 35 : 18933 - 18947
  • [39] Speaker Identification and Verification of Noisy Speech Using Multitaper MFCC and Gaussian Mixture Models
    Veena, K. V.
    Mathew, Dominic
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON POWER, INSTRUMENTATION, CONTROL AND COMPUTING (PICC), 2015,
  • [40] Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset
    Almarshady, Nourah M.
    Alashban, Adal A.
    Alotaibi, Yousef A.
    APPLIED SCIENCES-BASEL, 2023, 13 (17):