Visually-derived wiener filters for speech enhancement

被引:0
作者
Almajai, Ibrahim [1 ]
Ben Milner [1 ]
Darch, Jonathan [1 ]
Vaseghi, Saeed [2 ]
机构
[1] Univ E Anglia, Sch Comp Sci, Norwich NR4 7TJ, Norfolk, England
[2] Brunel Univ, Dept Elect & Comp Engn, Uxbridge, Middx, England
来源
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年
关键词
audio-visual; Wiener filter; speech enhancement; HMM; GMM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This work begins by examining the correlation between audio and visual speech features and reveals higher correlation to exist within individual phoneme sounds rather than globally across all speech. Utilising this correlation, a visually-derived Wiener filter is proposed in which clean power spectrum estimates are obtained from visual speech features. Two methods of extracting clean power spectrum estimates are made; first from a global estimate using a single Gaussian niixture model (GMM), and second from phoneme-specific estimates using a hidden Markov model (HMM)-GMM structure. Measurement of estimation accuracy reveals that the phoneme-specific (HMM-GMM) system leads to lower estimation errors than the global (GMM) system. Finally, the effectiveness of visually-derived Wiener filtering is examined.
引用
收藏
页码:585 / +
页数:2
相关论文
共 9 条
  • [1] ALMAJAI I, 2006, P ICSLP
  • [2] BARKER J, 1999, P AVSP 99
  • [3] Chatterjee S., 2013, Regression analysis by example
  • [4] Active appearance models
    Cootes, TF
    Edwards, GJ
    Taylor, CJ
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (06) : 681 - 685
  • [5] Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction
    Shao, X
    Milner, B
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 118 (02) : 1134 - 1143
  • [6] SORIN A, 2003, 202212 ES
  • [7] THEOBALD B, 2003, THESIS U E ANGLIA NO
  • [8] THERRIEN CW, 1992, DESCRETE RENDOM SIGN
  • [9] Quantitative association of vocal-tract and facial behavior
    Yehia, H
    Rubin, P
    Vatikiotis-Bateson, E
    [J]. SPEECH COMMUNICATION, 1998, 26 (1-2) : 23 - 43