ON USING SPECTRAL GRADIENT IN CONDITIONAL MAP CRITERION FOR ROBUST VOICE ACTIVITY DETECTION

被引:0
作者
Choi, Jae-Hun [1 ]
Chang, Joon-Hyuk [1 ]
机构
[1] Hanyang Univ, Sch Elect Engn, Seoul 133791, South Korea
来源
PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC 2012) | 2012年
关键词
Voice activity detection; Spectral gradient; Conditional MAP; Likelihood ratio test;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose a novel approach to improve a statistical model-based voice activity detection (VAD) method based on a modified conditional maximum a posteriori (MAP) criterion incorporating the spectral gradient scheme. The proposed conditional MAP incorporates not only the voice activity decision in the previous frame as in Ref. [1] but also the spectral gradient of the observed spectra between the current frame and the past frames to efficiently exploit the inter-frame correlation of voice activity. As a result, the proposed VAD leads to six separate thresholds to be adaptively determined in the likelihood ratio test (LRT) depending on both the previous VAD result and the estimated spectral gradient parameter. Experimental results demonstrate that the proposed approach yields better results compared to those of the previous conditional MAP-based method.
引用
收藏
页码:370 / 374
页数:5
相关论文
共 50 条
[41]   Noise robust voice activity detection based on switching Kalman filter [J].
Fujimoto, Masakiyo ;
Ishizuka, Kentaro .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03) :467-477
[42]   Voice activity detection using density ratio estimation of speech and noise [J].
Tachioka, Yuuki ;
Hanazawa, Toshiyuki ;
Narita, Tomohiro ;
Ishii, Jun .
IEEJ Transactions on Electronics, Information and Systems, 2013, 133 (08) :1549-1555+17
[43]   A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection [J].
Tamura, Satoshi ;
Ishikawa, Masato ;
Hashiba, Takashi ;
Takeuchi, Shin'ichi ;
Hayamizu, Satoru .
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, :2702-+
[44]   Voice Activity Detection Based on Sequential Gaussian Mixture Model with Maximum Likelihood Criterion [J].
Shen, Zhan ;
Wei, Jianguo ;
Lu, Wenhuan ;
Dang, Jianwu .
2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[45]   rVAD: An unsupervised segment-based robust voice activity detection method [J].
Tan, Zheng-Hua ;
Sarkar, Achintya Kr ;
Dehak, Najim .
COMPUTER SPEECH AND LANGUAGE, 2020, 59 :1-21
[46]   Noise robust voice activity detection based on periodic to aperiodic component ratio [J].
Ishizuka, Kentaro ;
Nakatani, Tomohiro ;
Fujimoto, Masakiyo ;
Miyazaki, Noboru .
SPEECH COMMUNICATION, 2010, 52 (01) :41-60
[47]   NOISE ROBUST VOICE ACTIVITY DETECTION USING NORMAL PROBABILITY TESTING AND TIME-DOMAIN HISTOGRAM ANALYSIS [J].
Ghaemmaghami, Houman ;
Dean, David ;
Sridharan, Sridha ;
McCowan, Iain .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4470-4473
[48]   Voice activity detection based on using wavelet packet [J].
Eshaghi, Mohadese ;
Mollaei, M. R. Karami .
DIGITAL SIGNAL PROCESSING, 2010, 20 (04) :1102-1115
[49]   Robust speaker recognition based on level-building voice activity detection [J].
Xie, Yan-Lu ;
Zhang, Jing-Song ;
Liu, Ming-Hui ;
Huang, Zhong-Wei .
Shenzhen Daxue Xuebao (Ligong Ban)/Journal of Shenzhen University Science and Engineering, 2012, 29 (04) :328-334
[50]   SELF-ADAPTIVE SOFT VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORKS FOR ROBUST SPEAKER VERIFICATION [J].
Jung, Youngmoon ;
Choi, Yeunju ;
Kim, Hoirin .
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, :365-372