Novel Two-Stage Audiovisual Speech Filtering in Noisy Environments

被引:9
作者
Abel, Andrew [1 ]
Hussain, Amir [1 ]
机构
[1] Univ Stirling, Sch Nat Sci, Stirling FK9 4AL, Scotland
关键词
Speech enhancement; Multimodal speech filtering; Audiovisual speech processing; PHONETIC INFORMATION; ENHANCEMENT; LIPS; MOVEMENTS; VOICE; CUES;
D O I
10.1007/s12559-013-9231-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, we build on previous work by the authors and present a novel two-stage audiovisual speech enhancement system, making use of audio-only beamforming, automatic lip tracking, and pre-processing with visually derived Wiener speech filtering. Initial results have demonstrated that this two-stage multimodal speech enhancement approach can produce positive results with noisy speech mixtures that conventional audio-only beamforming would struggle to cope with, such as in very noisy environments with a very low signal to noise ratio, and when the type of noise is difficult for audio-only beamforming to process.
引用
收藏
页码:200 / 217
页数:18
相关论文
共 61 条
[1]  
Abel A, 2009, LECT NOTES COMPUT SC, V5707, P65, DOI 10.1007/978-3-642-04391-8_9
[2]  
ACERO A, 1990, INT CONF ACOUST SPEE, P849, DOI 10.1109/ICASSP.1990.115971
[3]  
Almajai I, 2007, P AVSP
[4]  
Almajai I, 2009, P INT BRIGHT UK
[5]  
Almajai I, 2007, INT CONF ACOUST SPEE, P585
[6]   Joint Blind Source Separation With Multivariate Gaussian Model: Algorithms and Performance Analysis [J].
Anderson, Matthew ;
Adali, Tuelay ;
Li, Xi-Lin .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2012, 60 (04) :1672-1683
[7]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[8]  
Barker J, 1999, AVSP 99 INT C AUD VI
[9]  
Benoit C, 1996, NATO ASI SER, V150, P315
[10]  
Bernstein LE, 2004, AVSP, V2003, P2003