Feature Vector Normalization with Combined Standard and Throat Microphones for Robust ASR

被引：0

作者：

Buera, Luis ^{[1
]}

Miguel, Antonio ^{[1
]}

Saz, Oscar ^{[1
]}

Ortega, Alfonso ^{[1
]}

Lleida, Eduardo ^{[1
]}

机构：

[1] Univ Zaragoza, Commun Technol Grp GTC, E-50009 Zaragoza, Spain

来源：

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年

关键词：

Throat microphone; robust speech recognition; feature vector normalization;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose on-line unsupervised compensation technique for robust speech recognition that combines standard and throat microphone feature vectors. The solution, called Multi-Environment Model-based Linear Normalization with Throat microphone information, MEMLINT, is an extension of MEM-LIN formulation. Hence, standard microphone noisy space and throat microphone space arc modelled as GMMs and a set of linear transformations are learnt from data associated to each pair of Gaussians (one for each GMM) using training stereo data. On the other hand, to compensate some kinds of degradation which are not considered in MEMLINT, we propose to use jointly an on-line unsupervised acoustic model adaptation method based on rotation transformations over an expanded HMM-state space (augMented stAte space acousTic dEcoder, MATE). Some experiments with an own recorded database were carried out, showing that the proposed approach significantly outperforms the single microphone approach.

引用

页码：1289 / 1292

页数：4

共 10 条

[1] Acker-Mills B, 2005, SPEECH INTELLIGIBILI
[2] [Anonymous], 2000, ETSI 201 108 V112
[3] BUERA L, 2007, P ASRU DIC
[4] Cepstral vector normalization based on stereo data for robust speech recognition
Buera, Luis
Lleida, Eduardo
Miguel, Antonio
Ortega, Alfonso
Saz, Oscar
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1098 - 1113
[5] Audio-visual integration in multimodal communication
Chen, T
Rao, RR
[J]. PROCEEDINGS OF THE IEEE, 1998, 86 (05) : 837 - 852
[6] DUPONT S, 2004, WORKSH ITRW ROB ISS
[7] Combining standard and throat microphones for robust speech recognition
Graciarena, M
Franco, H
Sonmez, K
Bratt, H
[J]. IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (03) : 72 - 74
[8] Capturing local variability for speaker normalization in speech recognition
Miguel, Antonio
Lleida, Eduardo
Rose, Richard
Buera, Luis
Saz, Oscar
Ortega, Alfonso
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 578 - 593
[9] VANDENHEUVEL H, 1999, P EUR BUD HUNG SEP, V5, P2279
[10] Zhang ZY, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS, P781

← 1 →