USING CONTEXTUAL INFORMATION IN JOINT FACTOR EIGENSPACE MLLR FOR SPEECH RECOGNITION IN DIVERSE SCENARIOS

被引：0

作者：

Saz, Oscar ^{[1
]}

Hain, Thomas ^{[1
]}

机构：

[1] Univ Sheffield, Speech & Hearing Res Grp, Sheffield, S Yorkshire, England

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Speech recognition; adaptation; eigenspace MLLR; joint factorisation; metadata; SPEAKER; ADAPTATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a new approach for rapid adaptation in the presence of highly diverse scenarios that takes advantage of information describing the input signals. We introduce a new method for joint factorisation of the background and the speaker in an eigenspace MLLR framework: Joint Factor Eigenspace MLLR (JFEMLLR). We further propose to use contextual information describing the speaker and background, such as tags or more complex metadata, to provide an immediate estimation of the best MLLR transformation for the utterance. This provides instant adaptation, since it does not require any transcription from a previous decoding stage. Evaluation in a highly diverse Automatic Speech Recognition (ASR) task, a modified version of WSJCAM0, yields an improvement of 26.9% over the baseline, which is an extra 1.2% reduction over two-pass MLLR adaptation.

引用

页数：5

共 20 条

[1]

[Anonymous], 2008, P 1287 ACM CONEXT

[2] Cepstral vector normalization based on stereo data for robust speech recognition [J].

Buera, Luis ;

Lleida, Eduardo ;

Miguel, Antonio ;

Ortega, Alfonso ;

Saz, Oscar .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03) :1098-1113

[3]

Chen K.-t., 2000, INTERSPEECH, P742

[4] Front-End Factor Analysis for Speaker Verification [J].

Dehak, Najim ;

Kenny, Patrick J. ;

Dehak, Reda ;

Dumouchel, Pierre ;

Ouellet, Pierre .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04) :788-798

[5]

Droppo J., P 7 EUR C SPEECH COM, P217

[6]

Gales M.J.F., 2001, P 2001 AUT SPEECH RE

[7]

Gales M.J.F., 1998, P 5 EUR C SPEECH COM

[8] Mean and variance adaptation within the MLLR framework [J].

Gales, MJF ;

Woodland, PC .

COMPUTER SPEECH AND LANGUAGE, 1996, 10 (04) :249-264

[9] PERCEPTUAL LINEAR PREDICTIVE (PLP) ANALYSIS OF SPEECH [J].

HERMANSKY, H .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1990, 87 (04) :1738-1752

[10]

Jolliffe I., 2002, PRINCIPAL COMPONENT, DOI [10.1007/978-1-4757-1904-8_7, 10.1016/0169-7439(87)80084-9]

← 1 2 →