SEGMENT-LEVEL TRAINING OF ANNS BASED ON ACOUSTIC CONFIDENCE MEASURES FOR HYBRID HMM/ANN SPEECH RECOGNITION

被引：0

作者：

Dubagunta, S. Pavankumar ^{[1
,2
]}

Magimai-Doss, Mathew ^{[1
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Speech recognition; confidence measures; local posterior probability; segment-level training; NEURAL-NETWORKS; MODELS;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We show that confidence measures estimated from local posterior probabilities can serve as objective functions for training ANNs in hybrid HMM based speech recognition systems. This leads to a segment-level training paradigm that overcomes the limitation of frame-level updates ignoring the sequence structure in speech. We propose measures that train at the state and phone segment levels, while still decoding in the conventional framework. Experimental results on multiple corpora show that such trainings not only yield better systems in terms of performance, but also give additional improvements with sequence discriminative training. These techniques generalise across front-ends and model architectures, and efficiently handle the effect of segment duration variations on the ANN training.

引用

页码：6435 / 6439

页数：5

共 27 条

[1] Abdel-Hamid O, 2013, INTERSPEECH, P1848
[2] [Anonymous], 1993, NASA STI RECON TECHN
[3] [Anonymous], P INT C LEARN REPR I
[4] ARADILLA G, 2007, P ICASSP
[5] Aradilla G., 2008, P INT
[6] Austin S., 1991, TECH REP
[7] Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Beck, Eugen
Hannemann, Mirko
Doetsch, Patrick
Schlueter, Ralf
Ney, Hermann
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 766 - 770
[8] BERNARDIS G, 1998, P INT C SPOK LANG PR, P775
[9] HYPOTHESIS TESTING AND INFORMATION-THEORY
BLAHUT, RE
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1974, 20 (04) : 405 - 417
[10] Bourlard H. A., 1994, Connectionist Speech Recognition: A Hybrid Approach

← 1 2 3 →