Convex Combination of Multiple Statistical Models With Application to VAD

被引:20
作者
Petsatodis, Theodoros [1 ,2 ]
Boukis, Christos [2 ]
Talantzis, Fotios [2 ,3 ]
Tan, Zheng-Hua [4 ]
Prasad, Ramjee [1 ]
机构
[1] Aalborg Univ, Ctr TeleInFrastruktur CTIF, DK-9220 Aalborg, Denmark
[2] Athens Informat Technol Ctr, Auton & Grid Comp Grp, Athens 19002, Greece
[3] Univ London Imperial Coll Sci Technol & Med, London SW7 2AZ, England
[4] Aalborg Univ, Dept Elect Syst, DK-9220 Aalborg, Denmark
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 08期
关键词
Classification; convex combination; statistical models; voice activity detection (VAD); VOICE;
D O I
10.1109/TASL.2011.2131131
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a robust voice activity detector (VAD) based on the observation that the distribution of speech captured with far-field microphones is highly varying, depending on the noise and reverberation conditions. The proposed VAD employs a convex combination scheme comprising three statistical distributions-a Gaussian, a Laplacian, and a two-sided Gamma-to effectively model captured speech. This scheme shows increased ability to adapt to dynamic acoustic environments. The contribution of each distribution to this convex combination is automatically adjusted based on the statistical characteristics of the instantaneous audio input. To further improve the performance of the system, an adaptive threshold is introduced, while a decision-smoothing scheme caters to the intra-frame correlation of speech signals. Extensive experiments under realistic scenarios support the proposed approach of combining several models for increased adaptation and performance.
引用
收藏
页码:2314 / 2327
页数:14
相关论文
共 24 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]   Voice activity detection with generalized gamma distribution [J].
Almpanidis, George ;
Kotropoulos, Constantine .
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, :961-+
[3]  
[Anonymous], P 16 INT C DIG SIGN
[4]   ITU-T recommendation G.729 Annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications [J].
Benyassine, A ;
Shlomot, E ;
Su, HY ;
Massaloux, D ;
Lamblin, C ;
Petit, JP .
IEEE COMMUNICATIONS MAGAZINE, 1997, 35 (09) :64-73
[5]   Voice activity detection based on multiple statistical models [J].
Chang, Joon-Hyuk ;
Kim, Nam Soo ;
Mitra, Sanjit K. .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (06) :1965-1976
[6]   Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold [J].
Davis, A ;
Nordholm, S ;
Togneri, R .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02) :412-424
[7]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[8]   A soft voice activity detector based on a Laplacian-Gaussian model [J].
Gazor, S ;
Zhang, W .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :498-505
[9]   Speech probability distribution [J].
Gazor, S ;
Zhang, W .
IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (07) :204-207
[10]  
GEMELLO R, 2005, P INTERSPEECH 2005 L, P2617