Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引：0

作者：

Lambamo, Wondimu ^{[1
]}

Srinivasagan, Ramasamy ^{[1
,2
]}

Jifara, Worku ^{[1
]}

机构：

[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia

[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia

来源：

PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023 | 2024年 / 2068卷

关键词：

Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;

D O I：

10.1007/978-3-031-57624-9_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.

引用

页码：154 / 175

页数：22

共 50 条

[31] Speaker Identification Using Semi-supervised Learning
Fazakis, Nikos
Karlos, Stamatis
Kotsiantis, Sotiris
Sgarbas, Kyriakos
SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 389 - 396
[32] Skin Cancer Diagnosis using Deep Learning, Transfer Learning and Hybrid Model
Prakash, Ravi
Pandey, Trilok Nath
Dash, Bibhuti Bhusan
Patra, Sudhansu Shekhar
De, Utpal Chandra
Tripathy, Abinash
2024 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS, ICICI 2024, 2024, : 90 - 95
[33] Speaker identification using spectrogram and learning vector quantization
Li, Penghua
Zhang, Shunxing
Feng, Huizong
Li, Yuanyuan
Journal of Computational Information Systems, 2015, 11 (09): : 3087 - 3095
[34] End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks
Salvati, Daniele
Drioli, Carlo
Foresti, Gian Luca
INTERSPEECH 2019, 2019, : 4335 - 4339
[35] Speaker identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach
Chen, CCT
Chen, CT
Hou, CK
PATTERN RECOGNITION, 2004, 37 (05) : 1073 - 1075
[36] Text-Independent Speaker Identification Using the Histogram Transform Model
Ma, Zhanyu
Yu, Hong
Tan, Zheng-Hua
Guo, Jun
IEEE ACCESS, 2016, 4 : 9733 - 9739
[37] Robust Far-Field Speaker Identification under Mismatched Conditions
Jin, Qin
Schultz, Tanja
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1893 - 1896
[38] A robust DNN model for text-independent speaker identification using non-speaker embeddings in diverse data conditions
Nirupam Shome
Banala Saritha
Richik Kashyap
Rabul Hussain Laskar
Neural Computing and Applications, 2023, 35 : 18933 - 18947
[39] Speaker Identification and Verification of Noisy Speech Using Multitaper MFCC and Gaussian Mixture Models
Veena, K. V.
Mathew, Dominic
PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON POWER, INSTRUMENTATION, CONTROL AND COMPUTING (PICC), 2015,
[40] Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset
Almarshady, Nourah M.
Alashban, Adal A.
Alotaibi, Yousef A.
APPLIED SCIENCES-BASEL, 2023, 13 (17):

← 1 2 3 4 5 →