A hierarchical duration model for speech recognition based on the ANGIE framework

被引：5

作者：

Chung, GY ^{[1
]}

Seneff, S ^{[1
]}

机构：

[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA

来源：

SPEECH COMMUNICATION | 1999年 / 27卷 / 02期

基金：

美国国家科学基金会;

关键词：

duration modelling; prosodic modelling; speech recognition;

D O I：

10.1016/S0167-6393(98)00071-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.

引用

页码：113 / 134

页数：22

共 50 条

[21] A hybrid CTC plus Attention model based on end-to-end framework for multilingual speech recognition
Liang, Sendong
Yan, Wei Qi
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 41295 - 41308
[22] Speech recognition based on variable information rate model
Choi, IJ
Un, CK
Kim, NS
ELECTRONICS LETTERS, 1997, 33 (09) : 749 - 750
[23] A DOMESTIC SPEECH RECOGNITION BASED ON HIDDEN MARKOV MODEL
Tao, Jun
Jiang, Xiaoxiao
2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 606 - 609
[24] SPEECH RECOGNITION OF FOREIGN OUT-OF-VOCABULARY WORDS USING A HIERARCHICAL LANGUAGE MODEL
Yamamoto, Hirofumi
Kikui, Genichiro
Nakamura, Satoshi
Sagisaka, Yoshinori
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1870 - +
[25] Web-based Framework for Assisting Users Using Speech Recognition
Zahr, Hassan
Hassan, Hussein Al Haj
Haydar, Jamal
2018 19TH INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2018, : 240 - 245
[26] Spectral difference for statistical model-based speech enhancement in speech recognition
Lee, Soojeong
Chang, Joon-Hyuk
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (23) : 24917 - 24929
[27] Robust Cochlear-Model-Based Speech Recognition
Russo, Mladen
Stella, Maja
Sikora, Marjan
Pekic, Vesna
COMPUTERS, 2019, 8 (01)
[28] Spectral difference for statistical model-based speech enhancement in speech recognition
Soojeong Lee
Joon-Hyuk Chang
Multimedia Tools and Applications, 2017, 76 : 24917 - 24929
[29] A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition
Juneja, Amit
Espy-Wilson, Carol
Journal of the Acoustical Society of America, 2008, 123 (02): : 1154 - 1168
[30] Speech Recognition Based on Concatenated Acoustic Feature and LightGBM Model
Yu, Jiali
Qu, Yuanyuan
Zhang, Zhongkai
Lu, Qidong
Qin, Zhiliang
Liu, Xiaowei
TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719

← 1 2 3 4 5 →