Uncertainty decoding for DNN-HMM hybrid systems based on numerical sampling

被引:0
|
作者
Huemmer, Christian [1 ]
Maas, Roland [1 ]
Schwarz, Andreas [1 ]
Astudillo, Ramon Fernandez [2 ]
Kellermann, Walter [1 ]
机构
[1] Univ Erlangen Nurnberg, Multimedia Commun & Signal Proc, Erlangen, Germany
[2] INESC ID Lisboa, Spoken Language Syst Lab, Lisbon, Portugal
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
robust speech recognition; observation uncertainty; numerical sampling; uncertainty decoding; DEEP NEURAL-NETWORKS; SPEECH; ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this article, we propose an uncertainty decoding scheme for DNN-HMM hybrid systems based on numerical sampling. A finite set of samples is drawn from the estimated probability distribution of the acoustic features and subsequently passed through feature transformations/extensions and the deep neural network (DNN). Then, the nonlinearly-transformed feature samples are averaged at the output of the DNN in order to approximate the posterior distribution of the context-dependent Hidden Markov Model (HMM) states. This concept is experimentally verified for the REVERB challenge task using a reverberation-robust DNN-HMM hybrid system: The numerical sampling is performed in the logmelspec domain, where we estimate the posterior distribution of the acoustic features by combining coherence-based Wiener filtering and uncertainty propagation. The experimental results highlight the good performance of the proposed uncertainty decoding scheme with significantly increased recognition accuracy even for a small number of feature samples.
引用
收藏
页码:3556 / 3560
页数:5
相关论文
共 50 条
  • [1] A NEW UNCERTAINTY DECODING SCHEME FOR DNN-HMM HYBRID SYSTEMS WITH MULTICHANNEL SPEECH ENHANCEMENT
    Huemmer, Christian
    Schwarz, Andreas
    Maas, Roland
    Barfuss, Hendrik
    Astudillo, Ramon Fernandez
    Kellermann, Walter
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5760 - 5764
  • [2] AN IMPROVED UNCERTAINTY DECODING SCHEME WITH WEIGHTED SAMPLES FOR MULTI-CHANNEL DNN-HMM HYBRID SYSTEMS
    Huemmer, Christian
    Astudillo, Ramon Fernandez
    Kellermann, Walter
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 31 - 35
  • [3] Combining hybrid DNN-HMM ASR systems with attention-based models using lattice rescoring
    Li, Qiujia
    Zhang, Chao
    Woodland, Philip C.
    SPEECH COMMUNICATION, 2023, 147 : 12 - 21
  • [4] Recognizing the content types of network traffic based on a hybrid DNN-HMM model
    Tan, Xincheng
    Xie, Yi
    Ma, Haishou
    Yu, Shunzheng
    Hu, Jiankun
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 142 : 51 - 62
  • [5] On quantifying the quality of acoustic models in hybrid DNN-HMM ASR
    Dighe, Pranay
    Asaei, Afsaneh
    Bourlard, Herve
    SPEECH COMMUNICATION, 2020, 119 : 24 - 35
  • [6] Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers
    Ochiai, Tsubasa
    Matsuda, Shigeki
    Watanabe, Hideyuki
    Lu, Xugang
    Hori, Chiori
    Kawai, Hisashi
    Katagiri, Shigeru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2431 - 2443
  • [7] DNN-HMM based Automatic Speech Recognition for HRI Scenarios
    Novoa, Jose
    Wuth, Jorge
    Pablo Escudero, Juan
    Fredes, Josue
    Mahu, Rodrigo
    Becerra Yoma, Nestor
    HRI '18: PROCEEDINGS OF THE 2018 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2018, : 150 - 159
  • [8] Phonotactic Language Recognition Based on DNN-HMM Acoustic Model
    Liu, Wei-Wei
    Cai, Meng
    Yuan, Hua
    Shi, Xiao-Bei
    Zhang, Wei-Qiang
    Liu, Jia
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 153 - +
  • [9] Syllable based DNN-HMM Cantonese Speech-to-Text System
    Wong, Timothy
    Li, Claire W. Y.
    Lam, Sam
    Chiu, Billy
    Lu, Qin
    Li, Minglei
    Xiong, Dan
    Yu, Roy S.
    Ng, Vincent T. Y.
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3856 - 3862
  • [10] Hybrid Deep Neural Network - Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition
    Li, Longfei
    Zhao, Yong
    Jiang, Dongmei
    Zhang, Yanning
    Wang, Fengna
    Gonzalez, Isabel
    Valentin, Enescu
    Sahli, Hichem
    2013 HUMAINE ASSOCIATION CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2013, : 312 - 317