Subspace Gaussian mixture based language modeling for large vocabulary continuous speech recognition

被引：2

作者：

Sun, Ri Hyon ^{[1
]}

Chol, Ri Jong ^{[1
]}

机构：

[1] Kim Il Sung Univ, Coll Informat Sci, Taesong Dist, Pyongyang, North Korea

来源：

SPEECH COMMUNICATION | 2020年 / 117卷

关键词：

Language modeling; Speech recognition; Recurrent neural network; Subspace Gaussian mixture model;

D O I：

10.1016/j.specom.2020.01.001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper focuses on adaptable continuous space language modeling approach of combining longer context information of recurrent neural network (RNN) with adaptation ability of subspace Gaussian mixture model (SGMM) which has been widely used in acoustic modeling for automatic speech recognition (ASR). In large vocabulary continuous speech recognition (LVCSR) it is a challenging problem to construct language models that can capture the longer context information of words and ensure generalization and adaptation ability. Recently, language modeling based on RNN and its variants have been broadly studied in this field. The goal of our approach is to obtain the history feature vectors of a word with longer context information and model every word by subspace Gaussian mixture model such as Tandem system used in acoustic modeling for ASR. Also, it is to apply fMLLR adaptation method, which is widely used in SGMM based acoustic modeling, for adaptation of subspace Gaussian mixture based language model (SGMLM). After fMLLR adaptation, SGMLMs based on Top-Down and Bottom-Up obtain WERs of 5.70 % and 6.01%, which are better than 4.15% and 4.61% of that without adaptation, respectively. Also, with fMLLR adaptation, Top-Down and Bottom-Up based SGMLMs yield absolute word error rate reduction of 1.48%, 1.02% and a relative perplexity reduction of 10.02%, 6.46% compared to RNNLM without adaptation, respectively.

引用

页码：21 / 27

页数：7

共 22 条

[1]

Afify M, 2007, INT CONF ACOUST SPEE, P29

[2]

[Anonymous], P ASRU HAW US

[3]

[Anonymous], THESIS

[4]

[Anonymous], 2009, HTK BOOK VERSION 3 4

[5] Weighted Parzen windows for pattern classification [J].

Babich, GA ;

Camps, OI .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1996, 18 (05) :567-570

[6] A TREE-BASED STATISTICAL LANGUAGE MODEL FOR NATURAL-LANGUAGE SPEECH RECOGNITION [J].

BAHL, LR ;

BROWN, PF ;

DESOUZA, PV ;

MERCER, RL .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (07) :1001-1008

[7]

Brown P. F., 1992, Computational Linguistics, V18, P467

[8]

Burget L., 2010, P IEEE INT C AC SPEE

[9]

Ghoshal A., 2010, P IEEE INT C AC SPEE

[10]

Hermansky H, 2000, INT CONF ACOUST SPEE, P1635, DOI 10.1109/ICASSP.2000.862024

← 1 2 3 →