UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING

被引：0

作者：

Haidar, Md. Akmal ^{[1
]}

O'Shaughnessy, Douglas ^{[1
]}

机构：

[1] INRS EMT, Montreal, PQ H5A 1K6, Canada

来源：

2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE) | 2011年

关键词：

Mixture models; speech recognition; latent Dirichlet allocation; language model adaptation;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we introduce the weighting of topic models in mixture language model adaptation using n-grams of the topic models. Topic clusters are formed by using a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The n-grams of the topic generated by hard-clustering are used to compute the mixture weights of the component topic models. Instead of using all the words of the training vocabulary, selected words are used for LDA analysis, which are chosen by incorporating some information retrieval techniques. The proposed n-gram weighting approach shows significant reduction in perplexity and word error rate (WER) against a unigram weighting approach used in the literature.

引用

页码：857 / 860

页数：4

共 16 条

[1] Exploiting latent semantic information in statistical language modeling [J].