A hierarchical duration model for speech recognition based on the ANGIE framework

被引:5
|
作者
Chung, GY [1 ]
Seneff, S [1 ]
机构
[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
duration modelling; prosodic modelling; speech recognition;
D O I
10.1016/S0167-6393(98)00071-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:113 / 134
页数:22
相关论文
共 50 条
  • [41] Evaluating Spoken Language Model Based on Filler Prediction Model in Speech Recognition
    Ohta, Kengo
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1558 - +
  • [42] A Noise Robust Speech Recognition Method Using Model Compensation Based on Speech Enhancement
    Shen, Guanghu
    Jung, Ho-Youl
    Chung, Hyun-Yeol
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2008, 27 (04): : 191 - 199
  • [43] A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users
    Dekkers, Gert
    van Waterschoot, Toon
    Vanrumste, Bart
    Van Den Broeck, Bert
    Gemmeke, Jort F.
    Van Hamme, Hugo
    Karsmakers, Peter
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 746 - 750
  • [44] MAP speaker adaptation of state duration distributions for speech recognition
    Yoma, NB
    Sánchez, JS
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (07): : 443 - 450
  • [45] Parallel and Hierarchical Decision Making for Sparse Coding in Speech Recognition
    Wang, Dong
    Vipperla, Ravichander
    Evans, Nicholas
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2568 - 2571
  • [46] Hierarchical spectro-temporal features for robust speech recognition
    Domont, Xavier
    Heckmann, Martin
    Joublin, Frank
    Goerick, Christian
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4417 - 4420
  • [47] Noisy speech recognition by hierarchical recurrent neural fuzzy networks
    Juang, CF
    Chiou, CT
    Huang, HJ
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 5122 - 5125
  • [48] Online hierarchical transformation of hidden Markov models for speech recognition
    Chien, JT
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (06): : 656 - 667
  • [49] A Discriminative Hierarchical PLDA-Based Model for Spoken Language Recognition
    Ferrer, Luciana
    Castan, Diego
    McLaren, Mitchell
    Lawson, Aaron
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2396 - 2410
  • [50] Auditory-model based robust feature selection for speech recognition
    Koniaris, Christos
    Kuropatwinski, Marcin
    Kleijn, W. Bastiaan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (02) : EL73 - EL79