Speaker Identification based on MFSC voice feature extraction using Transformer

被引:3
|
作者
Bao, Liao [1 ]
Zuo, Yi [1 ]
机构
[1] Dalian Maritime Univ, Dalian, Peoples R China
来源
2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023 | 2023年
基金
中国国家自然科学基金;
关键词
Speaker Identification; voiceprint feature; extraction; MFSC; MFCC; neural network; SUPPORT VECTOR MACHINES; JOINT FACTOR-ANALYSIS;
D O I
10.1109/ICDMW60847.2023.00008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker identification is a type of biometric authentication technology. It can automatically identify the speaker's identity based on voice parameters. The core technology of speaker identification is to extract voice features that can best reflect the speaker's personality characteristics from the collected speech samples, and train models based on these features identify the speaker, recognize voiceprint, and so on. In the research field of speaker identification, voiceprint feature extraction briefly determines the accuracy of the speaker identification model. Among numerous voiceprint features, Mel Frequency Cepstral Coefficients (MFCC) are widely used in voiceprint identification systems due to the excellent performance of Mel filters. However, several studies revealed that MFCC features are not completely correlated globally, and only a few feature vectors are sufficient to represent most of the information in the signals. To address this limitation, we propose a new spectral representation of compressed speech, which is named as Mel Frequency Spectral Coefficients (MFSC). In MFSC, we eliminate discrete cosine transform (DCT). In the experiments, MFCC is used as the comparative feature, and end-to-end neural networks of bidirectional GRU, bidirectional LSTM, and Transformer are used as the identification models. According to 921 voice data from the LibriSpeech database, experiments have shown that the MFSC model using Transformer has better testing accuracy than MFCC models, and the error rate is reduced from 0.090 to 0.079.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [1] Speaker Identification Using MFCC Feature Extraction ANN Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (01) : 453 - 467
  • [2] Speaker Identification based on Hybrid Feature Extraction Techniques
    Abualadas, Feras E.
    Zeki, Akram M.
    Al-Ani, Muzhir Shaban
    Messikh, Az-Eddine
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (03) : 322 - 327
  • [3] A Hybrid GRU-CNN Feature Extraction Technique for Speaker Identification
    Shihab, Md Shazzad Hossain
    Aditya, Shuvra
    Setu, Jahangir Hossain
    Imtiaz-Ud-Din, K. M.
    Efat, Md Iftekharul Alam
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [4] PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION
    Wang, Jianglin
    Johnson, Michael T.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique
    Singh, Mahesh K.
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 138 (02) : 973 - 987
  • [6] Effectiveness of Feature Collaboration in Speaker Identification for Voice Biometrics
    Das, Arunima
    Roy, Lakshi Prosad
    Das, Santos Kumar
    2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
  • [7] A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction
    Miyajima, C
    Watanabe, H
    Tokuda, K
    Kitamura, T
    Katagiri, S
    SPEECH COMMUNICATION, 2001, 35 (3-4) : 203 - 218
  • [8] An Efficient Text-Independent Speaker Identification Using Feature Fusion and Transformer Model
    Khan, Arfat Ahmad
    Jahangir, Rashid
    Alroobaea, Roobaea
    Alyahyan, Saleh Yahya
    Almulhi, Ahmed H.
    Alsafyani, Majed
    Wechtaisong, Chitapong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4085 - 4100
  • [9] ROBUST SPEAKER IDENTIFICATION USING AN AUDITORY-BASED FEATURE
    Li, Qi
    Huang, Yan
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4514 - 4517
  • [10] Text-independent speaker identification based on selection of the most similar feature vectors
    Soleymanpour M.
    Marvi H.
    Soleymanpour, Mohammad (Soleimanpour141@gmail.com), 1600, Springer Science and Business Media, LLC (20): : 99 - 108