A short utterance speaker recognition method with improved cepstrum–CNN

被引：0

作者：

Yongfeng Li

Shuaishuai Chang

QingE Wu

机构：

[1] Zhengzhou University of Light Industry,School of Mathematics and Information Science

[2] Zhengzhou University of Light Industry,School of Electrical and Information Engineering

来源：

SN Applied Sciences | 2022年 / 4卷

关键词：

Short utterance; Speaker recognition; Mel frequency cepstrum coefficient; Convolutional neural network;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this study, an improved cepstrum-convolutional neural network is proposed, which can solve the problem of low recognition accuracy of 1-s short utterance in speaker recognition technology. The audio feature Mel frequency cepstrum coefficient is extracted by using the improved cepstrum algorithm and the data of the two-dimensional acoustic feature vector matrix is preprocessed to convert the two-dimensional feature matrix into a three-dimensional tensor as the input data of the two-dimensional convolutional neural network model. Experiments are carried out on an Arabic digital English pronunciation dataset with an audio duration of less than one second in a specific experimental environment. Moreover, the performance of this model is evaluated by accuracy and F1-score. The simulation results show that the accuracy of our proposed model for speech recognition is as high as 100% and 99.60% on the training and test sets, respectively, as well as the F1- score, is 0.9985. It can be seen that the recognition method of this model solves the problem of accuracy degradation of short utterance speaker recognition due to the short duration of the corpus and improves the accuracy of short speech voice recognition. The model is simple but effective, generalization, superior, and has higher practical application value.

引用

共 54 条

[1]

Campbell JP(1997)Speaker recognition: a tutorial Proc IEEE 85 1437-1462

[2]

Kinnunen T(2010)An overview of text-independent speaker recognition: from features to super vectors Speech Commun 52 12-40

[3]

Li H(2020)Two decades of speaker recognition evaluation at the national institute of standards and technology Comput Speech Lang 60 513-519

[4]

Greenberg CS(2021)Comparative performance analysis for speech digit recognition based on MFCC and vector quantization Glob Trans Proc 2 29-43

[5]

Mason LP(2020)HiLAM-state discriminative multi-task deep neural network in dynamictime warpingframework for text-dependent speaker verification Speech Commun 121 1878-1883

[6]

Sadjadi SO(2022)Speech recognition using HMM and soft computing Mater Today: Proc 51 290-308

[7]

Rakshith KD(2019)Multitaper MFCC and normalized multitaper phase-based features for speaker verification SN Appl Sci 1 7522-7539

[8]

Rudresh MD(2020)Robust deep speaker recognition: learning latent representation with joint angular margin loss Appl Sci 10 65-99

[9]

Shashibhushsan G(2021)Speaker recognition based on deep learning: an overview Neural Netw 140 247-257

[10]

Laskar MA(2018)Review of various stages in speaker recognition system, performance measures and recognition toolkits Analog Integr Circ Sig Process 94 556-559

← 1 2 3 4 5 6 →