A Vector Space Approach to Environment Modeling for Robust Speech Recognition

被引:0
|
作者
Tsao, Yu [1 ]
Lee, Chin-Hui [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
关键词
acoustic modeling; environment adaptation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a vector space approach to characterizing environments for robust speech recognition. We represent a given environment by a super-vector formed by concatenating all the mean vectors of the Gaussian mixture components of the state observation densities of all hidden Markov models trained in the particular environment. New environment super-vectors can now be obtained either by an interpolation method with a collection of super-vectors trained from many real or simulated environments or by a transformation performed on an anchor super-vector for a specific environment, such as a clean condition. At a 5dB signal-to-noise (SNR) level, both interpolation- and transformation-based approaches achieve a significant error rate reduction of close to 47% from a baseline system with cepstral mean subtraction (CMS) with only two adaptation utterances. When incorporating N-best information to perform unsupervised adaptation at 5dB SNR with the same two utterances, we achieve a relative error reduction of about 40%, close to that achieved in the supervised mode.
引用
收藏
页码:785 / 788
页数:4
相关论文
共 50 条
  • [31] An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification
    Feng, Xue
    Richardson, Brigitte
    Amman, Scott
    Glass, James
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3078 - 3082
  • [32] Robust speech recognition using a noise rejection approach
    Khan, E
    Levinson, R
    IEEE INTERNATIONAL JOINT SYMPOSIA ON INTELLIGENCE AND SYSTEMS - PROCEEDINGS, 1998, : 326 - 335
  • [33] Approach of features with confident weight for robust speech recognition
    Ge Lingnan
    Shirai, Katsuhiko
    Kurematsu, Akira
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2011, 32 (03) : 92 - 99
  • [34] A perceptual masking approach for noise robust speech recognition
    Hari Krishna Maganti
    Marco Matassoni
    EURASIP Journal on Audio, Speech, and Music Processing, 2012
  • [35] EmoFusionNet: A unified approach for robust speech emotion recognition
    Vijayan, Bineetha
    Judy, M. V.
    DIGITAL SIGNAL PROCESSING, 2025, 162
  • [36] A Bayesian predictive classification approach to robust speech recognition
    Huo, Q
    Lee, CH
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (02): : 200 - 204
  • [37] An Integrated Approach to Robust Speaker Identification and Speech Recognition
    Kwan, C.
    Yin, J.
    Ayhan, B.
    Chu, S.
    Liu, X.
    Puckett, K.
    Zhao, Y.
    Ho, K. C.
    Kruger, M.
    Sityar, I.
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1635 - +
  • [38] A Bayesian predictive classification approach to robust speech recognition
    Huo, Q
    Jiang, H
    Lee, CH
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1547 - 1550
  • [39] A Minimax Classification Approach with Application to Robust Speech Recognition
    Merhav, Neri
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (01): : 90 - 100
  • [40] Joint model and feature space optimization for robust speech recognition
    Hwang, JN
    Wang, CJ
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 855 - 858