Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

被引:0
|
作者
El-Moneim S.A. [1 ,2 ]
El-Mordy E.A. [1 ]
Nassar M.A. [1 ]
Dessouky M.I. [1 ]
Ismail N.A. [3 ]
El-Fishawy A.S. [1 ]
El-Dolil S. [1 ]
El-Dokany I.M. [1 ]
El-Samie F.E.A. [1 ,4 ]
机构
[1] Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf
[2] High Institute of Engineering and Technology, Tanta
[3] Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt, Menouf
[4] Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh
关键词
Feature extraction; LSTM RNN; Radon transform; Reverberation; Speaker recognition; Spectrogram; Speech enhancement; Text-independent speaker recognition;
D O I
10.1007/s10772-021-09880-6
中图分类号
学科分类号
摘要
Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification techniques are required. Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is an efficient network that can learn to recognize speakers, text-independently, when the recording circumstances are similar. Unfortunately, when the recording circumstances differ, its performance degrades. In this paper, Radon projection of the spectrograms of speech signals is implemented to get the features, since Radon Transform (RT) has less sensitivity to noise and reverberation conditions. The Radon projection is implemented on the spectrograms of speech signals, and then 2-D Discrete Cosine Transform (DCT) is computed. This technique improves the system recognition accuracy, text-independently with less sensitivity to noise and reverberation effects. The ASR system performance with the proposed features is compared to that of the system that depends on Mel Frequency Cepstral Coefficients (MFCCs) and spectrum features. For noisy utterances at 25 dB, the recognition rate with the proposed feature reaches 80%, while it is 27% and 28% with MFCCs and spectrum, respectively. For reverberant speech, the recognition rate reaches 80.67% with the proposed features, while it reaches 54% and 62.67% with the MFCCs and spectrum, respectively. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:679 / 687
页数:8
相关论文
共 50 条
  • [1] Text-independent Hakka Speaker Recognition in Noisy Environments
    Peng, Jie
    Chen, Chin-Ta
    Yang, Cheng-Fu
    SENSORS AND MATERIALS, 2025, 37 (01) : 441 - 451
  • [2] A deep learning approach for text-independent speaker recognition with short utterances
    Chakroun, Rania
    Frikha, Mondher
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (21) : 33111 - 33133
  • [3] A deep learning approach for text-independent speaker recognition with short utterances
    Rania Chakroun
    Mondher Frikha
    Multimedia Tools and Applications, 2023, 82 : 33111 - 33133
  • [4] Deep Speaker Feature Learning for Text-independent Speaker Verification
    Li, Lantian
    Chen, Yixiang
    Shi, Zing
    Tang, Zhiyuan
    Wang, Dong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
  • [5] Exploring discriminative learning for text-independent speaker recognition
    Liu, Ming
    Zhang, Zhengyou
    Hasegawa-Johnson, Mark
    Huang, Thomas S.
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 56 - 59
  • [6] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Abd El-Moneim, Samia
    Nassar, M. A.
    Dessouky, Moawad I.
    Ismail, Nabil A.
    El-Fishawy, Adel S.
    Abd El-Samie, Fathi E.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24013 - 24028
  • [7] Text-independent speaker recognition using LSTM-RNN and speech enhancement
    Samia Abd El-Moneim
    M. A. Nassar
    Moawad I. Dessouky
    Nabil A. Ismail
    Adel S. El-Fishawy
    Fathi E. Abd El-Samie
    Multimedia Tools and Applications, 2020, 79 : 24013 - 24028
  • [8] Performance Improvement of Text-Independent Speaker Verification Systems Based on Histogram Enhancement in Noisy Environments
    Kwon, C. H.
    Choi, J. K.
    Ambikairajah, E.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1901 - +
  • [10] Text-independent speaker recognition using graph matching
    Hautamaki, Ville
    Kinnunen, Tomi
    Franti, Pasi
    PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1427 - 1432