Joint Speech Enhancement and Speaker Identification Using Approximate Bayesian Inference

被引：12

作者：

Maina, Ciira Wa ^{[1
]}

Walsh, John MacLaren ^{[1
]}

机构：

[1] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 06期

基金：

美国国家科学基金会;

关键词：

Speech enhancement; speaker identification; variational Bayesian inference; NOISE; VOICE; RECOGNITION; MODEL;

D O I：

10.1109/TASL.2010.2092767

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We present a variational Bayesian algorithm for joint speech enhancement and speaker identification that makes use of speaker dependent speech priors. Our work is built on the intuition that speaker dependent priors would work better than priors that attempt to capture global speech properties. We derive an iterative algorithm that exchanges information between the speech enhancement and speaker identification tasks. With cleaner speech we are able to make better identification decisions and with the speaker dependent priors we are able to improve speech enhancement performance. We present experimental results using the TIMIT data set which confirm the speech enhancement performance of the algorithm by measuring signal-to-noise (SNR) ratio improvement and perceptual quality improvement via the Perceptual Evaluation of Speech Quality (PESQ) score. We also demonstrate the ability of the algorithm to perform voice activity detection (VAD). The experimental results also demonstrate that speaker identification accuracy is improved.

引用

页码：1517 / 1529

页数：13

共 46 条

[1]

[Anonymous], 1991, ELEMENTS INFORM THEO, DOI [DOI 10.1002/0471200611, 10.1002/0471200611]

[2]

[Anonymous], 2007, Speech Enhancement: Theory and Practice

[3]

[Anonymous], 2006, Pattern recognition and machine learning

[4]

Attias H, 2000, ADV NEUR IN, V12, P209

[5]

Attias Hagai., 2001, ADV NEURAL INFORM PR, V13

[6] ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].

Benyassine, A ;

Shlomot, E ;

Su, HY ;

Massaloux, D ;

Lamblin, C ;

Petit, JP .

IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73

[7] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[8]

CAPPE O, 2005, SPR S STAT, P1

[9] Elimination of the Musical Noise Phenomenon with the Ephraim and Malah Noise Suppressor [J].

Cappe, Olivier .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (02) :345-349

[10] Compensation of nuisance factors for speaker and language recognition [J].

Castaldo, Fabio ;

Colibro, Daniele ;

Dalmasso, Emanuele ;

Laface, Pietro ;

Vair, Claudio .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07) :1969-1978

← 1 2 3 4 5 →