Visually Derived Wiener Filters for Speech Enhancement

被引:58
作者
Almajai, Ibrahim [1 ]
Milner, Ben [1 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 06期
关键词
Audio-visual; maximum a posteriori; speech enhancement; Wiener filter; MODELS; NOISE;
D O I
10.1109/TASL.2010.2096212
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The aim of this work is to examine whether visual speech information can be used to enhance audio speech that has been contaminated by noise. First, an analysis of audio and visual speech features is made, which identifies the pair with highest audio-visual correlation. The study also reveals that higher audio-visual correlation exists within individual phoneme sounds rather than globally across all speech. This correlation is exploited in the proposal of a visually derived Wiener filter that obtains clean speech and noise power spectrum statistics from visual speech features. Clean speech statistics are estimated from visual features using a maximum a posteriori framework that is integrated within the states of a network of hidden Markov models to provide phoneme localization. Noise statistics are obtained through a novel audio-visual voice activity detector which utilizes visual speech features to make robust speech/nonspeech classifications. The effectiveness of the visually derived Wiener filter is evaluated subjectively and objectively and is compared with three different audio-only enhancement methods over a range of signal-to-noise ratios.
引用
收藏
页码:1642 / 1651
页数:10
相关论文
共 19 条
[1]  
ALMAJAI I, 2007, P ICASSP APR, V4, P585
[2]  
ALMAJAI I, 2006, P INT SEP, P2470
[3]  
ALMAJAI I, 2008, P EUSIPCO LAUS SWITZ
[4]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[5]  
[Anonymous], P IEEE INT C AC SPEE
[6]   Voice activity detection based on multiple statistical models [J].
Chang, Joon-Hyuk ;
Kim, Nam Soo ;
Mitra, Sanjit K. .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (06) :1965-1976
[7]   Active appearance models [J].
Cootes, TF ;
Edwards, GJ ;
Taylor, CJ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (06) :681-685
[8]   ON THE APPLICATION OF HIDDEN MARKOV-MODELS FOR ENHANCING NOISY SPEECH [J].
EPHRAIM, Y ;
MALAH, D ;
JUANG, BH .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (12) :1846-1856
[9]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445
[10]  
*ETSI, 2003, 202212 ES ETSI STQ A