A hierarchical duration model for speech recognition based on the ANGIE framework

被引:5
|
作者
Chung, GY [1 ]
Seneff, S [1 ]
机构
[1] MIT, Comp Sci Lab, Spoken Language Syst Grp, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
duration modelling; prosodic modelling; speech recognition;
D O I
10.1016/S0167-6393(98)00071-5
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from 89.3 to 91.6 (FOM) has resulted. (C) 1999 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:113 / 134
页数:22
相关论文
共 50 条
  • [1] A novel duration model for speech recognition
    Yuan, Lichi
    Wan, Changxuan
    PROCEEDINGS OF THE FOURTH IASTED INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, AND SYSTEMS, 2006, : 279 - +
  • [2] A MODEL STRUCTURE INTEGRATION BASED ON A BAYESIAN FRAMEWORK FOR SPEECH RECOGNITION
    Shiota, Sayaka
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4813 - 4816
  • [3] A hierarchical Bayesian model for continuous speech recognition
    Mouria-beji, F
    PATTERN RECOGNITION LETTERS, 2002, 23 (07) : 773 - 781
  • [4] A hierarchical point process model for speech recognition
    Jansen, Aren
    Niyogi, Partha
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4093 - 4096
  • [5] Neural Network Phone Duration Model for Speech Recognition
    Alumae, Tanel
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1204 - 1208
  • [6] A Novel Speech Recognition Model Utilizing Duration Correlation Information
    Yuan, Lichi
    2008 IEEE INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING WORKSHOP PROCEEDINGS, VOLS 1 AND 2, 2008, : 308 - 311
  • [7] A feature-based hierarchical speech recognition system for Hindi
    K Samudravijaya
    R Ahuja
    N Bondale
    T Jose
    S Krishnan
    P Poddar
    xxPVS Rao
    R Raveendran
    Sadhana, 1998, 23 : 313 - 340
  • [8] Correntropy Based Hierarchical Linear Dynamical System For Speech Recognition
    Singh, Rishabh
    Principe, Jose C.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [9] A feature-based hierarchical speech recognition system for Hindi
    Samudravijaya, K
    Ahuja, R
    Bondale, N
    Jose, T
    Krishnan, S
    Poddar, P
    Rao, PVS
    Raveendran, R
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1998, 23 (4): : 313 - 340
  • [10] A Bayesian Framework Using Multiple Model Structures for Speech Recognition
    Shiota, Sayaka
    Hashimoto, Kei
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (04): : 939 - 948