Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

被引:0
作者
Lambamo, Wondimu [1 ]
Srinivasagan, Ramasamy [1 ,2 ]
Jifara, Worku [1 ]
机构
[1] Adama Sci & Technol Univ, Adama 1888, Ethiopia
[2] King Faisal Univ, Al Hasa 31982, Saudi Arabia
来源
PAN-AFRICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PT I, PANAFRICON AI 2023 | 2024年 / 2068卷
关键词
Speaker Identification; Convolutional Neural Network; Cochleogram; Bidirectional Gated Recurrent Unit; Real-World Noises; FEATURES; MFCC; VERIFICATION;
D O I
10.1007/978-3-031-57624-9_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models' accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises andWGN were added to the dataset at the signal-to-noise ratio (SNR) of -5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of -5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise andWGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.
引用
收藏
页码:154 / 175
页数:22
相关论文
共 50 条
  • [41] Dysarthric speaker identification with different degrees of dysarthria severity using deep belief networks
    Farhadipour, Aref
    Veisi, Hadi
    Asgari, Mohammad
    Keyvanrad, Mohammad Ali
    ETRI JOURNAL, 2018, 40 (05) : 643 - 652
  • [42] A Novel Structural Damage Identification Method Using a Hybrid Deep Learning Framework
    He, Yingying
    Huang, Zhenghong
    Liu, Die
    Zhang, Likai
    Liu, Yi
    BUILDINGS, 2022, 12 (12)
  • [43] Human activity recognition from uav videos using an optimized hybrid deep learning model
    Sinha, Kumari Priyanka
    Kumar, Prabhat
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 51669 - 51698
  • [44] A Hybrid Model by Combining Discrete Cosine Transform and Deep Learning for Children Fingerprint Identification
    Kamble, Vaishali
    Dale, Manisha
    Bairagi, Vinayak
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 780 - 787
  • [45] Identification of piRNA disease associations using deep learning
    Ali, Syed Danish
    Tayara, Hilal
    Chong, Kil To
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 1208 - 1217
  • [46] Fake region identification in an image using deep learning segmentation model
    Jaiswal, Ankit Kumar
    Srivastava, Rajeev
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (25) : 38901 - 38921
  • [47] Fake region identification in an image using deep learning segmentation model
    Ankit Kumar Jaiswal
    Rajeev Srivastava
    Multimedia Tools and Applications, 2023, 82 : 38901 - 38921
  • [48] Speech Based Multiple Emotion Classification Model Using Deep Learning
    Patneedi, Shakti Swaroop
    Kumari, Nandini
    ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 648 - 659
  • [49] Predicting Learning Behaviour of Online Course Learners' using Hybrid Deep Learning Model
    Kavitha, S.
    Mohanavalli, S.
    Bharathi, B.
    PROCEEDINGS OF THE 2018 IEEE 6TH INTERNATIONAL CONFERENCE ON MOOCS, INNOVATION AND TECHNOLOGY IN EDUCATION (MITE 2018), 2018, : 98 - 102
  • [50] Speaker identification from emotional and noisy speech using learned voice segregation and speech VGG
    Hamsa, Shibani
    Shahin, Ismail
    Iraqi, Youssef
    Damiani, Ernesto
    Nassif, Ali Bou
    Werghi, Naoufel
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 224