Speaker Identification based on MFSC voice feature extraction using Transformer

被引：3

作者：

Bao, Liao ^{[1
]}

Zuo, Yi ^{[1
]}

机构：

[1] Dalian Maritime Univ, Dalian, Peoples R China

来源：

2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

Speaker Identification; voiceprint feature; extraction; MFSC; MFCC; neural network; SUPPORT VECTOR MACHINES; JOINT FACTOR-ANALYSIS;

D O I：

10.1109/ICDMW60847.2023.00008

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker identification is a type of biometric authentication technology. It can automatically identify the speaker's identity based on voice parameters. The core technology of speaker identification is to extract voice features that can best reflect the speaker's personality characteristics from the collected speech samples, and train models based on these features identify the speaker, recognize voiceprint, and so on. In the research field of speaker identification, voiceprint feature extraction briefly determines the accuracy of the speaker identification model. Among numerous voiceprint features, Mel Frequency Cepstral Coefficients (MFCC) are widely used in voiceprint identification systems due to the excellent performance of Mel filters. However, several studies revealed that MFCC features are not completely correlated globally, and only a few feature vectors are sufficient to represent most of the information in the signals. To address this limitation, we propose a new spectral representation of compressed speech, which is named as Mel Frequency Spectral Coefficients (MFSC). In MFSC, we eliminate discrete cosine transform (DCT). In the experiments, MFCC is used as the comparative feature, and end-to-end neural networks of bidirectional GRU, bidirectional LSTM, and Transformer are used as the identification models. According to 921 voice data from the LibriSpeech database, experiments have shown that the MFSC model using Transformer has better testing accuracy than MFCC models, and the error rate is reduced from 0.090 to 0.079.

引用

页码：1 / 7

页数：7

共 50 条

[1] Speaker Identification Using MFCC Feature Extraction ANN Classification Technique
Singh, Mahesh K.
WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (01) : 453 - 467
[2] Speaker Identification based on Hybrid Feature Extraction Techniques
Abualadas, Feras E.
Zeki, Akram M.
Al-Ani, Muzhir Shaban
Messikh, Az-Eddine
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (03) : 322 - 327
[3] A Hybrid GRU-CNN Feature Extraction Technique for Speaker Identification
Shihab, Md Shazzad Hossain
Aditya, Shuvra
Setu, Jahangir Hossain
Imtiaz-Ud-Din, K. M.
Efat, Md Iftekharul Alam
2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
[4] PHYSIOLOGICALLY-MOTIVATED FEATURE EXTRACTION FOR SPEAKER IDENTIFICATION
Wang, Jianglin
Johnson, Michael T.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[5] Identification of Speaker from Disguised Voice Using MFCC Feature Extraction, Chi-Square and Classification Technique
Singh, Mahesh K.
WIRELESS PERSONAL COMMUNICATIONS, 2024, 138 (02) : 973 - 987
[6] Effectiveness of Feature Collaboration in Speaker Identification for Voice Biometrics
Das, Arunima
Roy, Lakshi Prosad
Das, Santos Kumar
2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
[7] A new approach to designing a feature extractor in speaker identification based on discriminative feature extraction
Miyajima, C
Watanabe, H
Tokuda, K
Kitamura, T
Katagiri, S
SPEECH COMMUNICATION, 2001, 35 (3-4) : 203 - 218
[8] An Efficient Text-Independent Speaker Identification Using Feature Fusion and Transformer Model
Khan, Arfat Ahmad
Jahangir, Rashid
Alroobaea, Roobaea
Alyahyan, Saleh Yahya
Almulhi, Ahmed H.
Alsafyani, Majed
Wechtaisong, Chitapong
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (02): : 4085 - 4100
[9] ROBUST SPEAKER IDENTIFICATION USING AN AUDITORY-BASED FEATURE
Li, Qi
Huang, Yan
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4514 - 4517
[10] Text-independent speaker identification based on selection of the most similar feature vectors
Soleymanpour M.
Marvi H.
Soleymanpour, Mohammad (Soleimanpour141@gmail.com), 1600, Springer Science and Business Media, LLC (20): : 99 - 108

← 1 2 3 4 5 →