A hierarchical duration model for speech recognition based on the ANGIE framework

被引：5

作者：

Chung, GY ^{[1
]}

Seneff, S ^{[1
]}

机构：

[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA

来源：

SPEECH COMMUNICATION | 1999年 / 27卷 / 02期

基金：

美国国家科学基金会;

关键词：

duration modelling; prosodic modelling; speech recognition;

D O I：

10.1016/S0167-6393(98)00071-5

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.

引用

页码：113 / 134

页数：22

共 50 条

[31] Semantic Enhancement Framework for Robust Speech Recognition
Yang, Baochen
Yu, Kai
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 81 - 88
[32] A hybrid speech recognition model based on HMM and fuzzy PPM
Bao, P
Sim, A
1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 4148 - 4153
[33] An Acoustic Model For English Speech Recognition Based On Deep Learning
Ling, Zhang
2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 610 - 614
[34] ON LANGUAGE MODEL INTEGRATION FOR RNN TRANSDUCER BASED SPEECH RECOGNITION
Zhou, Wei
Zheng, Zuoyun
Schlueter, Ralf
Ney, Hermann
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8407 - 8411
[35] A Speech Recognition Acoustic Model Based on LSTM-CTC
Zhang, Yiwen
Lu, Xuanmin
2018 IEEE 18TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2018, : 1052 - 1055
[36] Language Model Based Non-speech Recognition Method
Zhang, Qinglin
Chen, Jianfeng
Bai, Jisheng
CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
[37] SPEECH RECOGNITION MODEL COMPRESSION
Sakthi, Madhumitha
Tewfik, Ahmed
Pawate, Raj
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7869 - 7873
[38] English speech emotion recognition method based on speech recognition
Liu, Man
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
[39] A TONE RECOGNITION FRAMEWORK FOR CONTINUOUS MANDARIN SPEECH
He, Lei
Hao, Jie
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1575 - 1578
[40] English speech emotion recognition method based on speech recognition
Man Liu
International Journal of Speech Technology, 2022, 25 : 391 - 398

← 1 2 3 4 5 →