Lexicon optimization based on discriminative learning for automatic speech recognition of agglutinative language

被引:7
作者
Ablimit, Mijit [1 ]
Kawahara, Tatsuya [1 ]
Hamdulla, Askar [2 ]
机构
[1] Kyoto Univ, Sch Informat, Kyoto, Japan
[2] Xinjiang Univ, Inst Informat Engn, Urumqi, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Language model; Lexicon; Morpheme; Discriminative learning; Uyghur;
D O I
10.1016/j.specom.2013.09.011
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For automatic speech recognition (ASR) of agglutinative languages, selection of a lexical unit is not obvious. The morpheme unit is usually adopted to ensure sufficient coverage, but many morphemes are short, resulting in weak constraints and possible confusion. We propose a discriminative approach for lexicon optimization that directly contributes to ASR error reduction by taking into account not only linguistic constraints but also acoustic phonetic confusability. It is based on an evaluation function for each word defined by a set of features and their weights, which are optimized by the difference in word error rates (WERs) between ASR hypotheses obtained by the morpheme-based model and those by the word-based model. Then, word or sub-word entries with higher evaluation scores are selected to be added to the lexicon. We investigate several discriminative models to realize this approach. Specifically, we implement it with support vector machines (SVM), logistic regression (LR) model as well as the simple perceptron algorithm. This approach was successfully applied to an Uyghur large-vocabulary continuous speech recognition system, resulting in a significant reduction of WER with a modest lexicon size and a small out-of-vocabulary rate. The use of SVM for a sub-word lexicon results in the best performance, outperforming the word-based model as well as conventional statistical concatenation approaches. The proposed learning approach is realized in an unsupervised manner because it does not require correct transcription for training data. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:78 / 87
页数:10
相关论文
共 39 条
[1]  
Ablimit M., 2010, P ICSP BEIJ
[2]  
Ablimit M., 2012, P IEEE ICASSP
[3]  
AFIFY M, 2006, P INTERSPEECH
[4]   Discriminative Language Modeling With Linguistic and Statistically Derived Features [J].
Arisoy, Ebru ;
Saraclar, Murat ;
Roark, Brian ;
Shafran, Izhak .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02) :540-550
[5]   Turkish Broadcast News Transcription and Retrieval [J].
Arisoy, Ebru ;
Can, Dogan ;
Parlak, Siddika ;
Sak, Hasim ;
Saraclar, Murat .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05) :874-883
[6]  
BERTON A, 1996, P ICSLP
[7]  
Carki K., 2000, P IEEE ICASSP
[8]  
Collins M., 2002, P EMNLP
[9]  
COLLINS M, 2005, P ACL, P507
[10]  
Creutz M., 2006, INDUCTION MORPHOLOGY