A hierarchical duration model for speech recognition based on the ANGIE framework

被引:5
|
作者
Chung, GY [1 ]
Seneff, S [1 ]
机构
[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
duration modelling; prosodic modelling; speech recognition;
D O I
10.1016/S0167-6393(98)00071-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:113 / 134
页数:22
相关论文
共 50 条
  • [31] Semantic Enhancement Framework for Robust Speech Recognition
    Yang, Baochen
    Yu, Kai
    MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2022, 2023, 1765 : 81 - 88
  • [32] A hybrid speech recognition model based on HMM and fuzzy PPM
    Bao, P
    Sim, A
    1998 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5, 1998, : 4148 - 4153
  • [33] An Acoustic Model For English Speech Recognition Based On Deep Learning
    Ling, Zhang
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 610 - 614
  • [34] ON LANGUAGE MODEL INTEGRATION FOR RNN TRANSDUCER BASED SPEECH RECOGNITION
    Zhou, Wei
    Zheng, Zuoyun
    Schlueter, Ralf
    Ney, Hermann
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8407 - 8411
  • [35] A Speech Recognition Acoustic Model Based on LSTM-CTC
    Zhang, Yiwen
    Lu, Xuanmin
    2018 IEEE 18TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2018, : 1052 - 1055
  • [36] Language Model Based Non-speech Recognition Method
    Zhang, Qinglin
    Chen, Jianfeng
    Bai, Jisheng
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
  • [37] SPEECH RECOGNITION MODEL COMPRESSION
    Sakthi, Madhumitha
    Tewfik, Ahmed
    Pawate, Raj
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7869 - 7873
  • [38] English speech emotion recognition method based on speech recognition
    Liu, Man
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398
  • [39] A TONE RECOGNITION FRAMEWORK FOR CONTINUOUS MANDARIN SPEECH
    He, Lei
    Hao, Jie
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1575 - 1578
  • [40] English speech emotion recognition method based on speech recognition
    Man Liu
    International Journal of Speech Technology, 2022, 25 : 391 - 398