Feature Vector Normalization with Combined Standard and Throat Microphones for Robust ASR

被引:0
作者
Buera, Luis [1 ]
Miguel, Antonio [1 ]
Saz, Oscar [1 ]
Ortega, Alfonso [1 ]
Lleida, Eduardo [1 ]
机构
[1] Univ Zaragoza, Commun Technol Grp GTC, E-50009 Zaragoza, Spain
来源
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 | 2008年
关键词
Throat microphone; robust speech recognition; feature vector normalization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose on-line unsupervised compensation technique for robust speech recognition that combines standard and throat microphone feature vectors. The solution, called Multi-Environment Model-based Linear Normalization with Throat microphone information, MEMLINT, is an extension of MEM-LIN formulation. Hence, standard microphone noisy space and throat microphone space arc modelled as GMMs and a set of linear transformations are learnt from data associated to each pair of Gaussians (one for each GMM) using training stereo data. On the other hand, to compensate some kinds of degradation which are not considered in MEMLINT, we propose to use jointly an on-line unsupervised acoustic model adaptation method based on rotation transformations over an expanded HMM-state space (augMented stAte space acousTic dEcoder, MATE). Some experiments with an own recorded database were carried out, showing that the proposed approach significantly outperforms the single microphone approach.
引用
收藏
页码:1289 / 1292
页数:4
相关论文
共 10 条
  • [1] Acker-Mills B, 2005, SPEECH INTELLIGIBILI
  • [2] [Anonymous], 2000, ETSI 201 108 V112
  • [3] BUERA L, 2007, P ASRU DIC
  • [4] Cepstral vector normalization based on stereo data for robust speech recognition
    Buera, Luis
    Lleida, Eduardo
    Miguel, Antonio
    Ortega, Alfonso
    Saz, Oscar
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (03): : 1098 - 1113
  • [5] Audio-visual integration in multimodal communication
    Chen, T
    Rao, RR
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (05) : 837 - 852
  • [6] DUPONT S, 2004, WORKSH ITRW ROB ISS
  • [7] Combining standard and throat microphones for robust speech recognition
    Graciarena, M
    Franco, H
    Sonmez, K
    Bratt, H
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2003, 10 (03) : 72 - 74
  • [8] Capturing local variability for speaker normalization in speech recognition
    Miguel, Antonio
    Lleida, Eduardo
    Rose, Richard
    Buera, Luis
    Saz, Oscar
    Ortega, Alfonso
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 578 - 593
  • [9] VANDENHEUVEL H, 1999, P EUR BUD HUNG SEP, V5, P2279
  • [10] Zhang ZY, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS, P781