Using Artificial Neural Network For Robust Voice Activity Detection Under Adverse Conditions

被引：0

作者：

Pham, Tuan V. ^{[1
]}

Tang, Chien T. ^{[1
]}

Stadtschnitzer, Michael ^{[2
]}

机构：

[1] Univ Danang, Univ Technol, Elect & Telecomm Engr Dept, Danang, Vietnam

[2] Graz Univ Technol, Graz Signal Proc & Speech Comm Inst, Inst Appl Syst Technol, JOANNEUM Res, Graz, Austria

来源：

2009 IEEE-RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION AND VISION FOR THE FUTURE | 2009年

关键词：

MODEL; RECOGNITION; VAD;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods are analyzed extensively. Second, we present an optimization procedure of neural network training, including evaluation of trained network performance with proper measures. Third, a large assortment of empirical results on the noisy TIMIT and SNOW corpuses including different types of noise at different signal-to-noise ratios is provided. We evaluate the built VAD model on the noisy corpuses and compare against the state-of-the-art VAD methods such as the ITU-T Rec. G. 729 Annex B, the ETSI AFE ES 202 050, and recently promising VAD algorithms. Results show that: (i) the proposed neural network classifier employing MFCC feature provides robustly high scores under different noisy conditions; (ii) the invented model is superior to other VAD methods in terms of various classification measures; (iii) the robustness of the developed VAD algorithm is still hold in the case of testing it with the completely mismatched environment.

引用

页码：35 / +

页数：2

共 20 条

[1] ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications
Benyassine, A
Shlomot, E
Su, HY
Massaloux, D
Lamblin, C
Petit, JP
[J]. IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) : 64 - 73
[2] Voice activity detection based on multiple statistical models
Chang, Joon-Hyuk
Kim, Nam Soo
Mitra, Sanjit K.
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (06) : 1965 - 1976
[3] Tauberian theorems in the statistical sense for the weighted means of double sequences
Chen, Chang-Pao
Chang, Chi-Tung
[J]. TAIWANESE JOURNAL OF MATHEMATICS, 2007, 11 (05): : 1327 - 1342
[4] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
Cohen, I
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 466 - 475
[5] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
DAVIS, SB
MERMELSTEIN, P
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
[6] *ETSI, 2003, 202050 ETSI ES
[7] Garofolo J. S., 1993, TIMIT ACOUSTIC PHONE
[8] A soft voice activity detector based on a Laplacian-Gaussian model
Gazor, S
Zhang, W
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 498 - 505
[9] HU Y, 2006, P INTERSPEECH 2006 P
[10] Statistical model-based VAD algorithm with wavelet transform
Lee, Yoon-Chang
Ahn, Sang-Sik
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2006, E89A (06) : 1594 - 1600

← 1 2 →