A Hybrid GRU-CNN Feature Extraction Technique for Speaker Identification

被引：4

作者：

Shihab, Md Shazzad Hossain ^{[1
]}

Aditya, Shuvra ^{[2
]}

Setu, Jahangir Hossain ^{[3
]}

Imtiaz-Ud-Din, K. M. ^{[1
]}

Efat, Md Iftekharul Alam ^{[2
]}

机构：

[1] Daffodil Int Univ, Dept Software Engn, Dhaka, Bangladesh

[2] Noakhali Sci & Technol Univ, Inst Informat Technol, Noakhali, Bangladesh

[3] Daffodil Int Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh

来源：

2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020) | 2020年

关键词：

Speaker Identification; Feature Extraction; GRU-CNN; Neural Network; MFCC; LPCC; LSF; END SPEECH RECOGNITION; COMPENSATION;

D O I：

10.1109/ICCIT51783.2020.9392734

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Speaker identification with diversified voice clip across the globe is a crucial and challenging task, specially extracting vigorous and discriminative features. In this paper, we demonstrated an end-to-end speaker identification pipeline introducing a hybrid Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) feature extraction technique. At first, the voice clip is converted to a spectrogram, then processed with the GRU and CNN model, a part of it is again transformed with residual CNN model optimizing the subspace loss to extract best and substantial feature vector. Later, a statistical based feature selection method is applied to combine and select most significant features. To validate the proposed GRU-CNN feature extractor, we have examined it in a large-scale voxcelb dataset from comprising of 6000 real world speakers with multiple voices. Finally, a comparative analysis with state-of-art feature extraction techniques is applied with a promising outcome of 91.08% accuracy along with 93.51% and 94.74% precision and recall values respectively.

引用

页数：6

共 50 条

[1] Speaker Identification based on Hybrid Feature Extraction Techniques
Abualadas, Feras E.
Zeki, Akram M.
Al-Ani, Muzhir Shaban
Messikh, Az-Eddine
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (03) : 322 - 327
[2] Speaker Identification Using MFCC Feature Extraction ANN Classification Technique
Singh, Mahesh K.
WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (01) : 453 - 467
[3] Speaker Identification based on MFSC voice feature extraction using Transformer
Bao, Liao
Zuo, Yi
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1 - 7
[4] PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION
Wang, Jianglin
Johnson, Michael T.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[5] Daily air temperature forecasting using LSTM-CNN and GRU-CNN models
Uluocak, Ihsan
Bilgili, Mehmet
ACTA GEOPHYSICA, 2024, 72 (03) : 2107 - 2126
[6] A Modified MFCC Feature Extraction Technique For Robust Speaker Recognition
Sharma, Diksha
Ali, Israj
2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1052 - 1057
[7] Evaluating Acoustic Feature Maps in 2D-CNN for Speaker Identification
Imran, Ali Shariq
Haflan, Vetle
Shahrebabaki, Abdolreza Sabzi
Olfati, Negar
Svendsen, Torbjorn Karl
ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 211 - 216
[8] A GRU-CNN model for auditory attention detection using microstate and recurrence quantification analysis
Eskandarinasab, Mohammadreza
Raeisi, Zahra
Lashaki, Reza Ahmadi
Najafi, Hamidreza
SCIENTIFIC REPORTS, 2024, 14 (01)
[9] DEEP MULTI-MODAL SCHIZOPHRENIA DISORDER DIAGNOSIS VIA A GRU-CNN ARCHITECTURE
Masoudi, B.
Danishvar, S.
NEURAL NETWORK WORLD, 2022, 32 (03) : 147 - 161
[10] Acoustic feature extraction method for robust speaker identification
Zuoqiang Li
Yong Gao
Multimedia Tools and Applications, 2016, 75 : 7391 - 7406

← 1 2 3 4 5 →