Stochastic feature compensation methods for speaker verification in noisy environments

被引：14

作者：

Sarkar, Sourjya ^{[1
]}

Rao, K. Sreenivasa ^{[1
]}

机构：

[1] Indian Inst Technol, Sch Informat Technol, Kharagpur 721302, W Bengal, India

来源：

APPLIED SOFT COMPUTING | 2014年 / 19卷

关键词：

Speaker verification; Noisy environment; Minimum mean squared error; Maximum likelihood estimate; Expectation Maximization algorithm; Gaussian Mixture Models; VECTOR NORMALIZATION; SPEECH RECOGNITION; ROBUST; ADAPTATION;

D O I：

10.1016/j.asoc.2014.02.016

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper explores the significance of stereo-based stochastic feature compensation (SFC) methods for robust speaker verification (SV) in mismatched training and test environments. Gaussian Mixture Model (GMM)-based SFC methods developed in past has been solely restricted for speech recognition tasks. Application of these algorithms in a SV framework for background noise compensation is proposed in this paper. A priori knowledge about the test environment and availability of stereo training data is assumed. During the training phase, Mel frequency cepstral coefficient (MFCC) features extracted from a speaker's noisy and clean speech utterance (stereo data) are used to build front end GMMs. During the evaluation phase, noisy test utterances are transformed on the basis of a minimum mean squared error (MMSE) or maximum likelihood (MLE) estimate, using the target speaker GMMs. Experiments conducted on the NIST-2003-SRE database with clean speech utterances artificially degraded with different types of additive noises reveal that the proposed SV systems strictly outperform baseline SV systems in mismatched conditions across all noisy background environments. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：198 / 214

页数：17

共 40 条

[1]

ACERO A, 1990, INT CONF ACOUST SPEE, P849, DOI 10.1109/ICASSP.1990.115971

[2]

ACERO A, 1990, THESIS CARNEGIE MELL

[3] Stereo-Based Stochastic Mapping for Robust Speech Recognition [J].

Afify, Mohamed ;

Cui, Xiaodong ;

Gao, Yuqing .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07) :1325-1334

[4]

[Anonymous], 1996, THESIS CARNEGIE MELL

[5]

[Anonymous], 1997, P EUR C SPEECH COMM

[6]

[Anonymous], 1995, NIST SPEAKER RECOGNI

[7]

[Anonymous], 2006, Pattern recognition and machine learning

[8] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[9]

Buera L., 2004, P IEEE INT C AC SPEE

[10] Cepstral vector normalization based on stereo data for robust speech recognition [J].

Buera, Luis ;

Lleida, Eduardo ;

Miguel, Antonio ;

Ortega, Alfonso ;

Saz, Oscar .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03) :1098-1113

← 1 2 3 4 →