Reconstruction of missing features by means of multivariate Laplace distribution (MLD) for noise robust speech recognition

被引：1

作者：

Mohammadi, Arash ^{[1
]}

Almasganj, Farshad ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Dept Biomed Engn, Tehran, Iran

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2011年 / 38卷 / 04期

关键词：

Signal to noise ratio (SNR); ENHANCEMENT; SENTENCES;

D O I：

10.1016/j.eswa.2010.09.053

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech recognition accuracy degrades in presence of additive noise, especially when recognizer's training data is clean. Several methods have been proposed to compensate effects of noise on recognition accuracy. Among these methods, Missing Feature Techniques (MFT) have shown promising results. Two different MF approaches have been introduced in literature: "Model-Based" and "Feature-Based" approaches. In the first category, the state distribution calculations should be changed and also some modifications are required to cope with filter bank features. But, in the second category, compensated representations of corrupted signals are reconstructed prior to recognition, and conventional recognizers, using MFCC features, are then used. In "Feature-Based" MFT, spectral vectors of speech signal frames are conventionally modeled by a Gaussian distribution (GD) and according to estimated parameters of the models, missed parts of speech representation are reconstructed. In this paper, we consider some researches that suggest multivariate Laplace distribution (MLD) to be a proper distribution for modeling speech signal. Here, we examine this idea in modeling log spectral representation of speech frames, and show that MLD acts better than Gaussian distribution. Moreover, We apply the Maximum Likelihood (ML) estimation of missing elements conditioned on observed values with respect to MLD and prove that the estimation equations are simple and tractable and by using this estimation in reconstruction of missing features, we gain better phoneme recognition accuracy against using "GD", in noisy conditions. In SNR values blew 10 dB in the cases of all of the noises, MLD improves the recognition accuracy more than 4% in most of the cases. (C) 2010 Elsevier Ltd. All rights reserved.

引用

页码：3918 / 3930

页数：13

共 29 条

[1] Andrews A. F., 1974, ROYAL STAT SOC J B, V36, P99
[2] DESCRIPTION AND GENERATION OF SPHERICALLY INVARIANT SPEECH-MODEL SIGNALS
BREHM, H
STAMMLER, W
[J]. SIGNAL PROCESSING, 1987, 12 (02) : 119 - 141
[3] Noise estimation by minima controlled recursive averaging for robust speech enhancement
Cohen, I
Berdugo, B
[J]. IEEE SIGNAL PROCESSING LETTERS, 2002, 9 (01) : 12 - 15
[4] Robust automatic speech recognition with missing and unreliable acoustic data
Cooke, M
Green, P
Josifovski, L
Vizinho, A
[J]. SPEECH COMMUNICATION, 2001, 34 (03) : 267 - 285
[5] Cooke M. P., 1997, P INT C SPEECH LANG, P1555
[6] COOKE MP, 1994, P 3 INT C SPOK LANG, P1555
[7] AN EXPERIMENTAL STUDY OF SPEECH-WAVE PROBABILITY DISTRIBUTIONS
DAVENPORT, WB
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1952, 24 (04) : 390 - 399
[8] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
DAVIS, SB
MERMELSTEIN, P
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
[9] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[10] On the multivariate Laplace distribution
Eltoft, T
Kim, T
Lee, TW
[J]. IEEE SIGNAL PROCESSING LETTERS, 2006, 13 (05) : 300 - 303

← 1 2 3 →