ROBUST VOICE ACTIVITY DETECTION USING EMPIRICAL MODE DECOMPOSITION AND MODULATION SPECTRUM ANALYSIS

被引:0
作者
Kanai, Yasuaki [1 ]
Unoki, Masashi [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa 9231292, Japan
来源
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING | 2012年
关键词
voice activity detection; empirical mode decomposition; modulation spectrum analysis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice activity detection (VAD) is used to detect speech/nonspeech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods to be accurately detected in these situations. This paper proposes an approach to robust VAD using empirical mode decomposition (EMD) and modulation spectrum analysis (MSA) to resolve these problems. This is proposed to reducing background noise by using EMD without estimating SNR (noise conditions), and then to determining speech/non-speech periods by using MSA. Three experiments on VAD in real environments were conducted to evaluate the proposed method by comparing it with typical methods (Otsu's and G. 729B). The results demonstrated that the proposed method could accurately detect speech periods more accurately than the typical methods.
引用
收藏
页码:400 / 404
页数:5
相关论文
共 14 条
[1]  
Atlas L., 2007, P INTERSPEECH2007 TU
[2]   ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].
Benyassine, A ;
Shlomot, E ;
Su, HY ;
Massaloux, D ;
Lamblin, C ;
Petit, JP .
IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73
[3]  
Goto M., 2004, Transactions of the Information Processing Society of Japan, V45, P728
[4]   The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis [J].
Huang, NE ;
Shen, Z ;
Long, SR ;
Wu, MLC ;
Shih, HH ;
Zheng, QN ;
Yen, NC ;
Tung, CC ;
Liu, HH .
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 1998, 454 (1971) :903-995
[5]  
Ishizuka K, 2006, INT CONF ACOUST SPEE, P789
[6]  
Kanedera N., 2001, Transactions of the Institute of Electronics, Information and Communication Engineers D-II, VJ84D-II, P1261
[7]  
Kanedera N., 1997, P EUR, P1079
[8]  
Lu X., 2011, P INTERSPEECH2011, P2653
[9]   THRESHOLD SELECTION METHOD FROM GRAY-LEVEL HISTOGRAMS [J].
OTSU, N .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1979, 9 (01) :62-66
[10]  
Ramirez J. M., 2007, ROBUST SPEECH RECOGN, V6, P1, DOI [10.5772/4740, DOI 10.5772/4740]