VARIABLE-LENGTH SEQUENCE MODELING - MULTIGRAMS

被引：6

作者：

BIMBOT, F

PIERACCINI, R

LEVIN, E

ATAL, B

机构：

[1] ENST, Dept. Signal, CNRS, Paris

[2] Speech Research Department, AT&T Bell Laboratories, Murray Hill

来源：

IEEE SIGNAL PROCESSING LETTERS | 1995年 / 2卷 / 06期

关键词：

D O I：

10.1109/97.388911

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The conventional n-gram language model exploits dependencies between words and their fixed-length past, This letter presents a model that represents sentences as a concatenation of variable-length sequences of units and describes an algorithm for unsupervised estimation of the model parameters. The approach is illustrated for the segmentation of sequences of letters into subword-like units, It is evaluated as a language model on a corpus of transcribed spoken sentences. Multigrams can provide a significantly lower test set perplexity than n-gram models.

引用

页码：111 / 113

页数：3