A Vector Space Approach to Environment Modeling for Robust Speech Recognition

被引:0
|
作者
Tsao, Yu [1 ]
Lee, Chin-Hui [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
关键词
acoustic modeling; environment adaptation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a vector space approach to characterizing environments for robust speech recognition. We represent a given environment by a super-vector formed by concatenating all the mean vectors of the Gaussian mixture components of the state observation densities of all hidden Markov models trained in the particular environment. New environment super-vectors can now be obtained either by an interpolation method with a collection of super-vectors trained from many real or simulated environments or by a transformation performed on an anchor super-vector for a specific environment, such as a clean condition. At a 5dB signal-to-noise (SNR) level, both interpolation- and transformation-based approaches achieve a significant error rate reduction of close to 47% from a baseline system with cepstral mean subtraction (CMS) with only two adaptation utterances. When incorporating N-best information to perform unsupervised adaptation at 5dB SNR with the same two utterances, we achieve a relative error reduction of about 40%, close to that achieved in the supervised mode.
引用
收藏
页码:785 / 788
页数:4
相关论文
共 50 条
  • [1] A LINEAR PROJECTION APPROACH TO ENVIRONMENT MODELING FOR ROBUST SPEECH RECOGNITION
    Tsao, Yu
    Huang, Chien-Lin
    Matsuda, Shigeki
    Hori, Chiori
    Kashioka, Hideki
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4329 - 4332
  • [2] An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition
    Tsao, Yu
    Lee, Chin-Hui
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 1025 - 1037
  • [3] A recursive feature vector normalization approach for robust speech recognition in noise
    Viikki, O
    Bye, D
    Laurila, K
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 733 - 736
  • [4] Feature adaptation using deviation vector for robust speech recognition in noisy environment
    Hwang, TH
    Lee, LM
    Wang, HC
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1227 - 1230
  • [5] A vector Taylor series approach for environment-independent speech recognition
    Moreno, PJ
    Raj, B
    Stern, RM
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 733 - 736
  • [6] SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT
    Mporas, Iosif
    Ganchev, Todor
    Kocsis, Otilia
    Fakotakis, Nikos
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2010, 19 (02) : 159 - 173
  • [7] Trajectory Modeling for Robust Speech Recognition
    Sim, KheChai
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXVII - XXVIII
  • [8] Robust phoneme recognition for a speech therapy environment
    Grossinho, Andre
    Guimaraes, Isabel
    Magalhaes, Joao
    Cavaco, Sofia
    2016 IEEE INTERNATIONAL CONFERENCE ON SERIOUS GAMES AND APPLICATIONS FOR HEALTH, 2016,
  • [9] Robust speech recognition for car environment noise
    Kokubo, H
    Amano, A
    Hataoka, N
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2002, 85 (11): : 65 - 73
  • [10] An environment adaptation method for robust speech recognition
    Han, JQ
    Zhang, L
    Wang, CF
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 726 - 729