Fast i-vector denoising using MAP estimation and a noise distributions database for robust speaker recognition

被引：13

作者：

Ben Kheder, Waad ^{[1
]}

Matrouf, Driss ^{[1
]}

Bousquet, Pierre-Michel ^{[1
]}

Bonastre, Jean-Francois ^{[1
]}

Ajili, Moez ^{[1
]}

机构：

[1] Univ Avignon, LIA, Agroparc BP 1228, F-84911 Avignon 9, France

来源：

COMPUTER SPEECH AND LANGUAGE | 2017年 / 45卷

关键词：

i-vectors; MAP adaptation; Speaker recognition; Additive noise; STOCHASTIC FEATURE; ADDITIVE NOISE; VERIFICATION;

D O I：

10.1016/j.csl.2016.12.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Once the i-vector paradigm has been introduced in the field of speaker recognition, many techniques have been proposed to deal with additive noise within this framework. Due to the complexity of its effect in the i-vector space, a lot of effort has been put into dealing with noise in other domains (speech enhancement, feature compensation, robust i-vector extraction and robust scoring). As far as we know, there was no serious attempt to handle the noise problem directly in the i-vector space without relying on data distributions computed on a prior domain. The aim of this paper is twofold. First, it proposes a full-covariance Gaussian modeling of the clean i-vectors and noise distribution in the i-vector space and introduces a technique to estimate a clean i-vector given the noisy version and the noise density function using the MAP approach. Based on NIST data, we show that it is possible to improve by up to 60% the baseline system performance. Second, in order to make this algorithm usable in a real application and reduce the computational time needed by i-MAP, we propose an extension that requires building a noise distribution database in the i-vector space in an off-line step and using it later in the test phase. We show that it is possible to achieve comparable results using this approach (up to 57% of relative EER improvement) with a sufficiently large noise distribution database. (C) 2017 Elsevier Ltd. All rights reserved.

引用

页码：104 / 122

页数：19

共 36 条

[1] Stereo-Based Stochastic Mapping for Robust Speech Recognition [J].

Afify, Mohamed ;

Cui, Xiaodong ;

Gao, Yuqing .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07) :1325-1334

[2]

[Anonymous], P 2007 INT

[3]

[Anonymous], P 2012 INT

[4]

[Anonymous], 2012, Proc. The Speaker and Language Recognition Workshop

[5]

[Anonymous], P INTERSPEECH

[6]

[Anonymous], YEAR 2008 SPEAK REC

[7]

[Anonymous], 2011, INTERSPEECH

[8]

[Anonymous], 2000, P ANN C INT SPEECH C

[9]

[Anonymous], INTERSPEECH

[10]

[Anonymous], 2010, P OD SPEAK LANG REC

← 1 2 3 4 →