A hierarchical duration model for speech recognition based on the ANGIE framework

被引:5
|
作者
Chung, GY [1 ]
Seneff, S [1 ]
机构
[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
duration modelling; prosodic modelling; speech recognition;
D O I
10.1016/S0167-6393(98)00071-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:113 / 134
页数:22
相关论文
共 50 条
  • [21] A hybrid CTC plus Attention model based on end-to-end framework for multilingual speech recognition
    Liang, Sendong
    Yan, Wei Qi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 41295 - 41308
  • [22] Speech recognition based on variable information rate model
    Choi, IJ
    Un, CK
    Kim, NS
    ELECTRONICS LETTERS, 1997, 33 (09) : 749 - 750
  • [23] A DOMESTIC SPEECH RECOGNITION BASED ON HIDDEN MARKOV MODEL
    Tao, Jun
    Jiang, Xiaoxiao
    2011 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS, 2011, : 606 - 609
  • [24] SPEECH RECOGNITION OF FOREIGN OUT-OF-VOCABULARY WORDS USING A HIERARCHICAL LANGUAGE MODEL
    Yamamoto, Hirofumi
    Kikui, Genichiro
    Nakamura, Satoshi
    Sagisaka, Yoshinori
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1870 - +
  • [25] Web-based Framework for Assisting Users Using Speech Recognition
    Zahr, Hassan
    Hassan, Hussein Al Haj
    Haydar, Jamal
    2018 19TH INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2018, : 240 - 245
  • [26] Spectral difference for statistical model-based speech enhancement in speech recognition
    Lee, Soojeong
    Chang, Joon-Hyuk
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (23) : 24917 - 24929
  • [27] Robust Cochlear-Model-Based Speech Recognition
    Russo, Mladen
    Stella, Maja
    Sikora, Marjan
    Pekic, Vesna
    COMPUTERS, 2019, 8 (01)
  • [28] Spectral difference for statistical model-based speech enhancement in speech recognition
    Soojeong Lee
    Joon-Hyuk Chang
    Multimedia Tools and Applications, 2017, 76 : 24917 - 24929
  • [29] A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition
    Juneja, Amit
    Espy-Wilson, Carol
    Journal of the Acoustical Society of America, 2008, 123 (02): : 1154 - 1168
  • [30] Speech Recognition Based on Concatenated Acoustic Feature and LightGBM Model
    Yu, Jiali
    Qu, Yuanyuan
    Zhang, Zhongkai
    Lu, Qidong
    Qin, Zhiliang
    Liu, Xiaowei
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719