Syllable language models for Mandarin speech recognition: Exploiting character language models

被引:18
作者
Liu, Xunying [1 ]
Hieronymus, James L. [2 ]
Gales, Mark J. F. [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
CHINESE-LANGUAGE; ADAPTATION; ALGORITHM;
D O I
10.1121/1.4768800
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Mandarin Chinese is based on characters which are syllabic in nature and morphological in meaning. All spoken languages have syllabiotactic rules which govern the construction of syllables and their allowed sequences. These constraints are not as restrictive as those learned from word sequences, but they can provide additional useful linguistic information. Hence, it is possible to improve speech recognition performance by appropriately combining these two types of constraints. For the Chinese language considered in this paper, character level language models (LMs) can be used as a first level approximation to allowed syllable sequences. To test this idea, word and character level n-gram LMs were trained on 2.8 billion words (equivalent to 4.3 billion characters) of texts from a wide collection of text sources. Both hypothesis and model based combination techniques were investigated to combine word and character level LMs. Significant character error rate reductions up to 7.3% relative were obtained on a state-of-the-art Mandarin Chinese broadcast audio recognition task using an adapted history dependent multi-level LM that performs a log-linearly combination of character and word level LMs. This supports the hypothesis that character or syllable sequence models are useful for improving Mandarin speech recognition performance. (C) 2013 Acoustical Society of America. [http://dx.doi.org/10.1121/1.4768800]
引用
收藏
页码:519 / 528
页数:10
相关论文
共 44 条
  • [1] A specialized on-the-fly algorithm for lexicon and language, model composition
    Caseiro, Diamantino
    Trancoso, Isabel
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1281 - 1291
  • [2] Chen L., 2001, P ISCA ITRW 01 PAR
  • [3] An empirical study of smoothing techniques for language modeling
    Chen, SF
    Goodman, J
    [J]. COMPUTER SPEECH AND LANGUAGE, 1999, 13 (04) : 359 - 394
  • [4] Chu S. M., 2010, P IEEE ICASSP2010 DA
  • [5] Clarkson PR, 1997, INT CONF ACOUST SPEE, P799, DOI 10.1109/ICASSP.1997.596049
  • [6] GENERALIZED ITERATIVE SCALING FOR LOG-LINEAR MODELS
    DARROCH, JN
    RATCLIFF, D
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1972, 43 (05): : 1470 - &
  • [7] de Francis J., 1984, CHINESE LANGUAGE FAC, P1
  • [8] Evermann G., 2000, P SPEECH TRANSCR WOR
  • [9] Federico M., 1999, P EUROSPEECH 99 BUD
  • [10] Fiscus J. G., 1997, P IEEE ASRU 97