Noise robust voice activity detection based on switching Kalman filter

被引：27

作者：

Fujimoto, Masakiyo ^{[1
]}

Ishizuka, Kentaro ^{[1
]}

机构：

[1] NTT Corp, NTT Commun Sci Lab, Kyoto 6190237, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2008年 / E91D卷 / 03期

关键词：

voice activity detection; statistical model; switching Kalman filter; noisy environment; CENSREC-1-C;

D O I：

10.1093/ietisy/e91-d.3.467

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech / silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. In this paper, we carried out two evaluations. In the first, we observed that the proposed method significantly outperforms conventional methods as regards voice activity detection accuracy in simulated noise environments. Second, we evaluated the proposed method on a VAD evaluation framework, CENSREC-1-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-1-C as regards VAD accuracy in real environments. In addition, we confirmed that the proposed method helps to improve the accuracy of concatenated speech recognition in real environments.

引用

页码：467 / 477

页数：11

共 14 条

[1]

[Anonymous], KALMAN FILTERING THE

[2] A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking [J].

Arulampalam, MS ;

Maskell, S ;

Gordon, N ;

Clapp, T .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2002, 50 (02) :174-188

[3] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121

[4]

*ETSI, 2005, 202050V114 ETSI ES

[5]

ISHIZUKA K, 2006, P SAPA 06 SEPT, P65

[6]

Ishizuka K, 2006, INT CONF ACOUST SPEE, P789

[7]

NAKAMURA A, 1996, P ICSLP, V4, P2199

[8] AURORA-2J: An evaluation framework for Japanese noisy speech recognition [J].

Nakamura, S ;

Takeda, K ;

Yamamoto, K ;

Yamada, T ;

Kuroiwa, S ;

Kitakoka, N ;

Nishiura, T ;

Sasou, A ;

Mizumachi, M ;

Miyajima, C ;

Fujimoto, M ;

Endo, T .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (03) :535-544

[9] Robust voice activity detection using higher-order statistics in the LPC residual domain [J].

Nemer, E ;

Goubran, R ;

Mahmoud, S .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03) :217-231

[10] THRESHOLD SELECTION METHOD FROM GRAY-LEVEL HISTOGRAMS [J].

OTSU, N .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1979, 9 (01) :62-66

← 1 2 →