Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

被引：0

作者：

El-Moneim S.A. ^{[1
,2
]}

El-Mordy E.A. ^{[1
]}

Nassar M.A. ^{[1
]}

Dessouky M.I. ^{[1
]}

Ismail N.A. ^{[3
]}

El-Fishawy A.S. ^{[1
]}

El-Dolil S. ^{[1
]}

El-Dokany I.M. ^{[1
]}

El-Samie F.E.A. ^{[1
,4
]}

机构：

[1] Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf

[2] High Institute of Engineering and Technology, Tanta

[3] Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt, Menouf

[4] Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh

来源：

International Journal of Speech Technology | 2022年 / 25卷 / 03期

关键词：

Feature extraction; LSTM RNN; Radon transform; Reverberation; Speaker recognition; Spectrogram; Speech enhancement; Text-independent speaker recognition;

D O I：

10.1007/s10772-021-09880-6

中图分类号：

学科分类号：

摘要：

Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification techniques are required. Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is an efficient network that can learn to recognize speakers, text-independently, when the recording circumstances are similar. Unfortunately, when the recording circumstances differ, its performance degrades. In this paper, Radon projection of the spectrograms of speech signals is implemented to get the features, since Radon Transform (RT) has less sensitivity to noise and reverberation conditions. The Radon projection is implemented on the spectrograms of speech signals, and then 2-D Discrete Cosine Transform (DCT) is computed. This technique improves the system recognition accuracy, text-independently with less sensitivity to noise and reverberation effects. The ASR system performance with the proposed features is compared to that of the system that depends on Mel Frequency Cepstral Coefficients (MFCCs) and spectrum features. For noisy utterances at 25 dB, the recognition rate with the proposed feature reaches 80%, while it is 27% and 28% with MFCCs and spectrum, respectively. For reverberant speech, the recognition rate reaches 80.67% with the proposed features, while it reaches 54% and 62.67% with the MFCCs and spectrum, respectively. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

引用

页码：679 / 687

页数：8

共 50 条

[1] Text-independent Hakka Speaker Recognition in Noisy Environments
Peng, Jie
Chen, Chin-Ta
Yang, Cheng-Fu
SENSORS AND MATERIALS, 2025, 37 (01) : 441 - 451
[2] A deep learning approach for text-independent speaker recognition with short utterances
Chakroun, Rania
Frikha, Mondher
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (21) : 33111 - 33133
[3] A deep learning approach for text-independent speaker recognition with short utterances
Rania Chakroun
Mondher Frikha
Multimedia Tools and Applications, 2023, 82 : 33111 - 33133
[4] Deep Speaker Feature Learning for Text-independent Speaker Verification
Li, Lantian
Chen, Yixiang
Shi, Zing
Tang, Zhiyuan
Wang, Dong
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1542 - 1546
[5] Exploring discriminative learning for text-independent speaker recognition
Liu, Ming
Zhang, Zhengyou
Hasegawa-Johnson, Mark
Huang, Thomas S.
2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 56 - 59
[6] Text-independent speaker recognition using LSTM-RNN and speech enhancement
Abd El-Moneim, Samia
Nassar, M. A.
Dessouky, Moawad I.
Ismail, Nabil A.
El-Fishawy, Adel S.
Abd El-Samie, Fathi E.
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (33-34) : 24013 - 24028
[7] Text-independent speaker recognition using LSTM-RNN and speech enhancement
Samia Abd El-Moneim
M. A. Nassar
Moawad I. Dessouky
Nabil A. Ismail
Adel S. El-Fishawy
Fathi E. Abd El-Samie
Multimedia Tools and Applications, 2020, 79 : 24013 - 24028
[8] Performance Improvement of Text-Independent Speaker Verification Systems Based on Histogram Enhancement in Noisy Environments
Kwon, C. H.
Choi, J. K.
Ambikairajah, E.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1901 - +
[9] TEXT-INDEPENDENT SPEAKER RECOGNITION USING NEURAL NETWORKS
HATTORI, H
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (03) : 345 - 351
[10] Text-independent speaker recognition using graph matching
Hautamaki, Ville
Kinnunen, Tomi
Franti, Pasi
PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1427 - 1432

← 1 2 3 4 5 →