SEGMENT-LEVEL TRAINING OF ANNS BASED ON ACOUSTIC CONFIDENCE MEASURES FOR HYBRID HMM/ANN SPEECH RECOGNITION

被引:0
作者
Dubagunta, S. Pavankumar [1 ,2 ]
Magimai-Doss, Mathew [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Speech recognition; confidence measures; local posterior probability; segment-level training; NEURAL-NETWORKS; MODELS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We show that confidence measures estimated from local posterior probabilities can serve as objective functions for training ANNs in hybrid HMM based speech recognition systems. This leads to a segment-level training paradigm that overcomes the limitation of frame-level updates ignoring the sequence structure in speech. We propose measures that train at the state and phone segment levels, while still decoding in the conventional framework. Experimental results on multiple corpora show that such trainings not only yield better systems in terms of performance, but also give additional improvements with sequence discriminative training. These techniques generalise across front-ends and model architectures, and efficiently handle the effect of segment duration variations on the ANN training.
引用
收藏
页码:6435 / 6439
页数:5
相关论文
共 27 条
  • [1] Abdel-Hamid O, 2013, INTERSPEECH, P1848
  • [2] [Anonymous], 1993, NASA STI RECON TECHN
  • [3] [Anonymous], P INT C LEARN REPR I
  • [4] ARADILLA G, 2007, P ICASSP
  • [5] Aradilla G., 2008, P INT
  • [6] Austin S., 1991, TECH REP
  • [7] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
    Beck, Eugen
    Hannemann, Mirko
    Doetsch, Patrick
    Schlueter, Ralf
    Ney, Hermann
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
  • [8] BERNARDIS G, 1998, P INT C SPOK LANG PR, P775
  • [9] HYPOTHESIS TESTING AND INFORMATION-THEORY
    BLAHUT, RE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1974, 20 (04) : 405 - 417
  • [10] Bourlard H. A., 1994, Connectionist Speech Recognition: A Hybrid Approach