A Hybrid GRU-CNN Feature Extraction Technique for Speaker Identification

被引:4
|
作者
Shihab, Md Shazzad Hossain [1 ]
Aditya, Shuvra [2 ]
Setu, Jahangir Hossain [3 ]
Imtiaz-Ud-Din, K. M. [1 ]
Efat, Md Iftekharul Alam [2 ]
机构
[1] Daffodil Int Univ, Dept Software Engn, Dhaka, Bangladesh
[2] Noakhali Sci & Technol Univ, Inst Informat Technol, Noakhali, Bangladesh
[3] Daffodil Int Univ, Dept Comp Sci & Engn, Dhaka, Bangladesh
来源
2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020) | 2020年
关键词
Speaker Identification; Feature Extraction; GRU-CNN; Neural Network; MFCC; LPCC; LSF; END SPEECH RECOGNITION; COMPENSATION;
D O I
10.1109/ICCIT51783.2020.9392734
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speaker identification with diversified voice clip across the globe is a crucial and challenging task, specially extracting vigorous and discriminative features. In this paper, we demonstrated an end-to-end speaker identification pipeline introducing a hybrid Gated Recurrent Unit (GRU) and Convolutional Neural Network (CNN) feature extraction technique. At first, the voice clip is converted to a spectrogram, then processed with the GRU and CNN model, a part of it is again transformed with residual CNN model optimizing the subspace loss to extract best and substantial feature vector. Later, a statistical based feature selection method is applied to combine and select most significant features. To validate the proposed GRU-CNN feature extractor, we have examined it in a large-scale voxcelb dataset from comprising of 6000 real world speakers with multiple voices. Finally, a comparative analysis with state-of-art feature extraction techniques is applied with a promising outcome of 91.08% accuracy along with 93.51% and 94.74% precision and recall values respectively.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Speaker Identification based on Hybrid Feature Extraction Techniques
    Abualadas, Feras E.
    Zeki, Akram M.
    Al-Ani, Muzhir Shaban
    Messikh, Az-Eddine
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (03) : 322 - 327
  • [2] Speaker Identification Using MFCC Feature Extraction ANN Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (01) : 453 - 467
  • [3] Speaker Identification based on MFSC voice feature extraction using Transformer
    Bao, Liao
    Zuo, Yi
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1 - 7
  • [4] PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION
    Wang, Jianglin
    Johnson, Michael T.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Daily air temperature forecasting using LSTM-CNN and GRU-CNN models
    Uluocak, Ihsan
    Bilgili, Mehmet
    ACTA GEOPHYSICA, 2024, 72 (03) : 2107 - 2126
  • [6] A Modified MFCC Feature Extraction Technique For Robust Speaker Recognition
    Sharma, Diksha
    Ali, Israj
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1052 - 1057
  • [7] Evaluating Acoustic Feature Maps in 2D-CNN for Speaker Identification
    Imran, Ali Shariq
    Haflan, Vetle
    Shahrebabaki, Abdolreza Sabzi
    Olfati, Negar
    Svendsen, Torbjorn Karl
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 211 - 216
  • [8] A GRU-CNN model for auditory attention detection using microstate and recurrence quantification analysis
    Eskandarinasab, Mohammadreza
    Raeisi, Zahra
    Lashaki, Reza Ahmadi
    Najafi, Hamidreza
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [9] DEEP MULTI-MODAL SCHIZOPHRENIA DISORDER DIAGNOSIS VIA A GRU-CNN ARCHITECTURE
    Masoudi, B.
    Danishvar, S.
    NEURAL NETWORK WORLD, 2022, 32 (03) : 147 - 161
  • [10] Acoustic feature extraction method for robust speaker identification
    Zuoqiang Li
    Yong Gao
    Multimedia Tools and Applications, 2016, 75 : 7391 - 7406