Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引：0

作者：

Lambamo, Wondimu ^{[1
]}

Srinivasagan, Ramasamy ^{[1
,2
]}

Jifara, Worku ^{[1
]}

机构：

[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia

[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia

来源：

PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023 | 2024年 / 2068卷

关键词：

Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;

D O I：

10.1007/978-3-031-57624-9_9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.

引用

页码：154 / 175

页数：22

共 50 条

[21] Model Free Identification of Traffic Conditions Using Unmanned Aerial Vehicles and Deep Learning
Eleni I. Vlahogianni
Javier Del Ser
Konstantinos Kepaptsoglou
Ibai Laña
Journal of Big Data Analytics in Transportation, 2021, 3 (1): : 1 - 13
[22] Design of a Hybrid Bioinspired Deep Learning Model for Identification of Heart Diseases Using Clinical Parameters
Kulkarni D.
Soni R.
SN Computer Science, 4 (5)
[23] Voiceprint Identification for Limited Dataset Using the Deep Migration Hybrid Model Based on Transfer Learning
Sun, Cunwei
Yang, Yuxin
Wen, Chang
Xie, Kai
Wen, Fangqing
SENSORS, 2018, 18 (07)
[24] Speaker Identification in Noisy Environment with Use of the Precise Model of the Human Auditory System
Azetsu, Tadahiro
Abuku, Masahiro
Suetake, Noriaki
Uchino, Eiji
INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 92 - 95
[25] Emotional speaker identification using a novel capsule nets model
Nassif, Ali Bou
Shahin, Ismail
Elnagar, Ashraf
Velayudhan, Divya
Alhudhaif, Adi
Polat, Kemal
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
[26] Sheep Identification Using a Hybrid Deep Learning and Bayesian Optimization Approach
Salama, Aya
Hassanien, Aboul Ellah
Fahmy, Aly
IEEE ACCESS, 2019, 7 : 31681 - 31687
[27] Speaker verification using IMNMF and MFCC with feature warping under noisy environment
Jiang, Changjiang
Ba, Lifang
Tang, Xianlun
Wen, Dengfeng
2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 2583 - 2588
[28] Real-Time Speaker Identification Using Speaker Model Distance
Zeinali, Hossein
Sameti, Hossein
Hadian, Hossein
2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 643 - 647
[29] A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients
Salvati, Daniele
Drioli, Carlo
Foresti, Gian Luca
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 222
[30] Deep Learning-Based End-to-End Speaker Identification Using Time-Frequency Representation of Speech Signal
Saritha, Banala
Laskar, Mohammad Azharuddin
Kirupakaran, Anish Monsley
Laskar, Rabul Hussain
Choudhury, Madhuchhanda
Shome, Nirupam
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2023, 43 (3) : 1839 - 1861

← 1 2 3 4 5 →