Research on Robustness of Voiceprint Recognition Technology

被引：2

作者：

Shen, Haijuan ^{[1
]}

Wang, Bo ^{[1
]}

Wang, Junsheng ^{[2
]}

机构：

[1] State Grid Elect Commerce Co LTD, Beijing, Peoples R China

[2] State Grid Xiongan Financial Technol Grp Co LTD, Beijing, Peoples R China

来源：

2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018) | 2018年

关键词：

Convolution neural network; Feature extraction; Robustness; EMD; Voiceprint recognition; Endpoint detection;

D O I：

10.1145/3302425.3302467

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the mobile Internet era, voiceprint recognition technology is an important means of identity authentication in social, shopping, online shop, financial transactions and other activities. With the popularity of smart phones, these network activities can be carried out at anytime and anywhere. Inevitably, there are a variety of environmental noise in the voice input process of voiceprint recognition. Improving the anti-noise and robustness of voiceprint recognition system is the key to the wide application of voiceprint recognition technology. This paper focuses on the key modules of voiceprint recognition system, such as endpoint detection, feature extraction, voiceprint modeling and matching. Mainstream technologies of each module are analyzed, and a robust voiceprint recognition method is proposed. In this method, the endpoint detection algorithm based on spectrum is used, and the Empirical Mode Decomposition (EMD Empirical Mode Decomposition) algorithm is used to reconstruct the spectrum, which can effectively realize voiceprint denoising. Using the superior performance of convolution neural network (CNN) model in the field of image, the voiceprint model is constructed to improve the recognition efficiency and realize the robustness of the voiceprint recognition system.

引用

页数：5

共 9 条

[1] AUTOMATIC RECOGNITION OF SPEAKERS FROM THEIR VOICES [J].

ATAL, BS .

PROCEEDINGS OF THE IEEE, 1976, 64 (04) :460-475

[2] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].

DAVIS, SB ;

MERMELSTEIN, P .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366

[3] Reducing the dimensionality of data with neural networks [J].

Hinton, G. E. ;

Salakhutdinov, R. R. .

SCIENCE, 2006, 313 (5786) :504-507

[4]

Lu Xiaoqian, 2014, SPEAKER RECOGNITION

[5]

OGLESBY J, 1990, INT CONF ACOUST SPEE, P261, DOI 10.1109/ICASSP.1990.115617

[6]

Pan J, 2012, 2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, P301, DOI 10.1109/ISCSLP.2012.6423452

[7] ROBUST TEXT-INDEPENDENT SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE SPEAKER MODELS [J].

REYNOLDS, DA ;

ROSE, RC .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01) :72-83

[8] Speaker verification using adapted Gaussian mixture models [J].

Reynolds, DA ;

Quatieri, TF ;

Dunn, RB .

DIGITAL SIGNAL PROCESSING, 2000, 10 (1-3) :19-41

[9] Multiscale handwritten character recognition using CNN image filters [J].

Saatci, E ;

Tavsanoglu, V .

PROCEEDING OF THE 2002 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-3, 2002, :2044-2048

← 1 →