Korean large vocabulary continuous speech recognition with morpheme-based recognition units

被引:56
作者
Kwon, OW
Park, J
机构
[1] Korea Adv Inst Sci & Technol, Brain Sci Res Ctr, Yuseong gu, Taejon 305701, South Korea
[2] ETRI, Spoken Language Proc Team, Yuseong Gu, Taejon 305350, South Korea
关键词
Korean large vocabulary continuous speech recognition; morpheme-based language model; broadcast news transcription;
D O I
10.1016/S0167-6393(02)00031-6
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In Korean writing, a space is placed between two adjacent word-phrases, each of which generally corresponds to two or three words in English in a semantic sense. If the word-phrase is used as a recognition unit for Korean large vocabulary continuous speech recognition (LVCSR), the but-of-vocabulary (OOV) rate becomes very large. If a morpheme or a syllable is used instead, a severe inter-morpheme coarticulation problem arises due to short morphemes. We propose to use a merged morpheme as the recognition unit and pronunciation-dependent entries in a language model (LM) so that we can reduce such difficulties and incorporate the between-word phonology rule into the decoding algorithm of a Korean LVCSR system. Starting from the original morpheme units defined in the Korean morphology, we merge pairs of short and frequent morphemes into larger units by using a rule-based method and a statistical method. We define the merged morpheme unit as word and use it as the recognition unit. The performance of the system was evaluated in two business-related tasks: a read speech recognition task and a broadcast news transcription task. The OOV rate was reduced to a level comparable to that of American English in both tasks. In the read speech recognition task, with a 32k vocabulary and a word-based trigram LM computed from a newspaper text corpus, the word error rate (WER) of the baseline system was reduced from 25.0% to:20.0% by cross-word modeling and pronunciation-dependent language modeling, and finally to 15.5% by increasing speech database and text corpora. For the broadcast news transcription task, we showed that the statistical method relatively reduced the WER of the baseline system without morpheme merging by 3.4% and both of the proposed methods yielded similar performance. Applying all the proposed techniques, we achieved 17.6% WER for clean speech and 27.7% for noisy speech. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:287 / 300
页数:14
相关论文
共 31 条
[1]  
[Anonymous], 1989, Automatic speech recognition: The development of the SPHINX system
[2]  
CLARKSON PR, 1997, P EUROSPEECH 97 RHOD
[3]  
DAELEMANS W, 1996, PROGR SPEECH SYNTHES
[4]   An efficient search space representation for large vocabulary continuous speech recognition [J].
Demuynck, K ;
Duchateau, J ;
Van Compernolle, D ;
Wambacq, P .
SPEECH COMMUNICATION, 2000, 30 (01) :37-53
[5]  
EIDE E, 2000, P DARPA SPEECH TRANS
[6]  
GAO S, 2000, P ICASSP 2000 IST TU
[7]  
GAUVAIN JL, 1996, ICASSP 96 ATL
[8]  
GEUTNER P, 1998, P ICASSP 98
[9]  
GILLICK L, 1989, P ICASSP 89
[10]  
GUO XF, 1999, P DARPA BROADC NEWS