Speech Recognition Using Augmented Conditional Random Fields

被引:45
作者
Hifny, Yasser [1 ]
Renals, Steve [2 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[2] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9LW, Midlothian, Scotland
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2009年 / 17卷 / 02期
关键词
Augmented conditional random fields (ACRFs); augmented spaces; discriminative compression; hidden Markov models (HMMs); PHONE RECOGNITION; FEATURES;
D O I
10.1109/TASL.2008.2010286
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although RMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT phone recognition task, a phone error rate of 23.0% was recorded on the full test set, a significant improvement over comparable HMM-based systems.
引用
收藏
页码:354 / 365
页数:12
相关论文
共 78 条
  • [11] What HMMs can do
    Bilmes, JA
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) : 869 - 891
  • [12] Buried Markov models for speech recognition
    Bilmes, JA
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 713 - 716
  • [13] BILMES JA, 1998, P ICSLP, P69
  • [14] Bishop Christopher M, 1995, Neural networks for pattern recognition
  • [15] Bocchieri E., 1993, P ICASSP, VII, P692
  • [16] Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
  • [17] Bourland H. A., 1994, CONNECTIONIST SPEECH
  • [18] BROWN PF, 1987, THESIS CARNEGIE MELL
  • [19] GEOMETRICAL AND STATISTICAL PROPERTIES OF SYSTEMS OF LINEAR INEQUALITIES WITH APPLICATIONS IN PATTERN RECOGNITION
    COVER, TM
    [J]. IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1965, EC14 (03): : 326 - &
  • [20] GENERALIZED ITERATIVE SCALING FOR LOG-LINEAR MODELS
    DARROCH, JN
    RATCLIFF, D
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (05): : 1470 - &