A hierarchical duration model for speech recognition based on the ANGIE framework

被引：5

作者：

Chung, GY ^{[1
]}

Seneff, S ^{[1
]}

机构：

[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA

来源：

SPEECH COMMUNICATION | 1999年 / 27卷 / 02期

基金：

美国国家科学基金会;

关键词：

duration modelling; prosodic modelling; speech recognition;

D O I：

10.1016/S0167-6393(98)00071-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.

引用

页码：113 / 134

页数：22

共 50 条

[1] A novel duration model for speech recognition
Yuan, Lichi
Wan, Changxuan
PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, AND SYSTEMS, 2006, : 279 - +
[2] A MODEL STRUCTURE INTEGRATION BASED ON A BAYESIAN FRAMEWORK FOR SPEECH RECOGNITION
Shiota, Sayaka
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4813 - 4816
[3] A hierarchical Bayesian model for continuous speech recognition
Mouria-beji, F
PATTERN RECOGNITION LETTERS, 2002, 23 (07) : 773 - 781
[4] A hierarchical point process model for speech recognition
Jansen, Aren
Niyogi, Partha
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4093 - 4096
[5] Neural Network Phone Duration Model for Speech Recognition
Alumae, Tanel
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1204 - 1208
[6] A Novel Speech Recognition Model Utilizing Duration Correlation Information
Yuan, Lichi
2008 IEEE INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING WORKSHOP PROCEEDINGS, VOLS 1 AND 2, 2008, : 308 - 311
[7] A feature-based hierarchical speech recognition system for Hindi
K Samudravijaya
R Ahuja
N Bondale
T Jose
S Krishnan
P Poddar
xxPVS Rao
R Raveendran
Sadhana, 1998, 23 : 313 - 340
[8] Correntropy Based Hierarchical Linear Dynamical System For Speech Recognition
Singh, Rishabh
Principe, Jose C.
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[9] A feature-based hierarchical speech recognition system for Hindi
Samudravijaya, K
Ahuja, R
Bondale, N
Jose, T
Krishnan, S
Poddar, P
Rao, PVS
Raveendran, R
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1998, 23 (4): : 313 - 340
[10] A Bayesian Framework Using Multiple Model Structures for Speech Recognition
Shiota, Sayaka
Hashimoto, Kei
Nankaku, Yoshihiko
Tokuda, Keiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04): : 939 - 948

← 1 2 3 4 5 →