VARIABLE-LENGTH SEQUENCE MODELING - MULTIGRAMS

被引:6
作者
BIMBOT, F
PIERACCINI, R
LEVIN, E
ATAL, B
机构
[1] ENST, Dept. Signal, CNRS, Paris
[2] Speech Research Department, AT&T Bell Laboratories, Murray Hill
关键词
D O I
10.1109/97.388911
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The conventional n-gram language model exploits dependencies between words and their fixed-length past, This letter presents a model that represents sentences as a concatenation of variable-length sequences of units and describes an algorithm for unsupervised estimation of the model parameters. The approach is illustrated for the segmentation of sequences of letters into subword-like units, It is evaluated as a language model on a corpus of transcribed spoken sentences. Multigrams can provide a significantly lower test set perplexity than n-gram models.
引用
收藏
页码:111 / 113
页数:3
相关论文
共 2 条
  • [1] Jelinek F., Self-organized language modeling for speech recognition, Readings in Speech Recognition (A. Waibel and K. F. Lee, Eds.), pp. 450-506, (1990)
  • [2] Multidata collection for a spoken language corpus, Proc. 5th DARPA Workshop Speech and Natural Language, pp. 7-14, (1992)