A hierarchical duration model for speech recognition based on the ANGIE framework

被引：5

作者：

Chung, GY ^{[1
]}

Seneff, S ^{[1
]}

机构：

[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA

来源：

SPEECH COMMUNICATION | 1999年 / 27卷 / 02期

基金：

美国国家科学基金会;

关键词：

duration modelling; prosodic modelling; speech recognition;

D O I：

10.1016/S0167-6393(98)00071-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.

引用

页码：113 / 134

页数：22

共 50 条

[41] Evaluating Spoken Language Model Based on Filler Prediction Model in Speech Recognition
Ohta, Kengo
Tsuchiya, Masatoshi
Nakagawa, Seiichi
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1558 - +
[42] A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement
Shen, Guanghu
Jung, Ho-Youl
Chung, Hyun-Yeol
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (04): : 191 - 199
[43] A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users
Dekkers, Gert
van Waterschoot, Toon
Vanrumste, Bart
Van Den Broeck, Bert
Gemmeke, Jort F.
Van Hamme, Hugo
Karsmakers, Peter
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 746 - 750
[44] MAP speaker adaptation of state duration distributions for speech recognition
Yoma, NB
Sánchez, JS
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 443 - 450
[45] Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition
Wang, Dong
Vipperla, Ravichander
Evans, Nicholas
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2568 - 2571
[46] Hierarchical spectro-temporal features for robust speech recognition
Domont, Xavier
Heckmann, Martin
Joublin, Frank
Goerick, Christian
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4417 - 4420
[47] Noisy speech recognition by hierarchical recurrent neural fuzzy networks
Juang, CF
Chiou, CT
Huang, HJ
2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 5122 - 5125
[48] Online hierarchical transformation of hidden Markov models for speech recognition
Chien, JT
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (06): : 656 - 667
[49] A Discriminative Hierarchical PLDA-Based Model for Spoken Language Recognition
Ferrer, Luciana
Castan, Diego
McLaren, Mitchell
Lawson, Aaron
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2396 - 2410
[50] Auditory-model based robust feature selection for speech recognition
Koniaris, Christos
Kuropatwinski, Marcin
Kleijn, W. Bastiaan
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (02) : EL73 - EL79

← 1 2 3 4 5 →