Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition

被引:4
|
作者
Kang, Byung Ok [1 ,2 ]
Kwon, Oh-Wook [2 ]
机构
[1] ETRI, SW Content Res Lab, Daejeon, South Korea
[2] Chungbuk Natl Univ, Sch Elect Engn, Cheongju, South Korea
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2016年 / E99D卷 / 03期
关键词
noise-robust speech recognition; acoustic model; GMM combination; non-native speech recognition;
D O I
10.1587/transinf.2015EDP7252
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a new method to combine multiple acoustic models in Gaussian mixture model (GMM) spaces for robust speech recognition. Even though large vocabulary continuous speech recognition (LVCSR) systems are recently widespread, they often make egregious recognition errors resulting from unavoidable mismatch of speaking styles or environments between the training and real conditions. To handle this problem, a multi-style training approach has been used conventionally to train a large acoustic model by using a large speech database with various kinds of speaking styles and environment noise. But, in this work, we combine multiple sub-models trained for different speaking styles or environment noise into a large acoustic model by maximizing the log-likelihood of the sub-model states sharing the same phonetic context and position. Then the combined acoustic model is used in a new target system, which is robust to variation in speaking style and diverse environment noise. Experimental results show that the proposed method significantly outperforms the conventional methods in two tasks: Non-native English speech recognition for second-language learning systems and noise-robust point-of-interest (POI) recognition for car navigation systems.
引用
收藏
页码:724 / 730
页数:7
相关论文
共 50 条
  • [41] Sub-band level Histogram Equalization for Robust Speech Recognition
    Joshi, Vikas
    Bilgi, Raghavendra
    Umesh, S.
    Garcia, L.
    Benitez, C.
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1672 - +
  • [42] Incomplete spectrogram reconstruction with kalman filter for noise robust speech recognition
    Mohammadi, Arash
    Almasganj, Farshad
    Sadrieh, Nima
    Zandi, Alireza
    2008 3RD INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING, VOLS 1-3, 2008, : 814 - +
  • [43] On the temporal decorrelation of feature parameters for noise-robust speech recognition
    Jung, HY
    Lee, SY
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 407 - 416
  • [44] Phase Auto Correlation (PAC) features for noise robust speech recognition
    Ikbal, Shajith
    Misra, Hemant
    Hermansky, Hynek
    Magimai-Doss, Mathew
    SPEECH COMMUNICATION, 2012, 54 (07) : 867 - 880
  • [45] ROBUST FRONT-END PROCESSING FOR SPEECH RECOGNITION IN NOISY CONDITIONS
    Das, Biswajit
    Panda, Ashish
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5235 - 5239
  • [46] FILTERING ON THE TEMPORAL PROBABILITY SEQUENCE IN HISTOGRAM EQUALIZATION FOR ROBUST SPEECH RECOGNITION
    Wang, Syu-Siang
    Tsao, Yu
    Hung, Jeih-weih
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7112 - 7116
  • [47] Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech
    Mitra, Vikramjit
    VanHout, Julien
    Wang, Wen
    Bartels, Chris
    Franco, Horacio
    Vergyri, Dimitra
    Alwan, Abeer
    Janin, Adam
    Hansen, John
    Stern, Richard
    Sangwan, Abhijeet
    Morgan, Nelson
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3683 - 3687
  • [48] A frame-based context-dependent acoustic modeling for speech recognition
    Terashima R.
    Zen H.
    Nankaku Y.
    Tokuda K.
    IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (10) : 1856 - 1864+24
  • [49] APPROXIMATED PARALLEL MODEL COMBINATION FOR EFFICIENT NOISE-ROBUST SPEECH RECOGNITION
    Sim, Khe Chai
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7383 - 7387
  • [50] Noise robust speech recognition using F0 contour information
    Iwano, K
    Seki, T
    Furui, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (05): : 1102 - 1109