ON THE USE OF MORPHOLOGICAL ANALYSIS FOR DIALECTAL ARABIC SPEECH RECOGNITION

被引:0
作者
Afify, Mohamed [1 ]
Sarikaya, Ruhi [1 ]
Kuo, Hong-Kwang Jeff [1 ]
Besacier, Laurent [1 ]
Gao, Yuqing [1 ]
机构
[1] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年
关键词
Speech recognition; language modeling; Dialectal Arabic; morphological analysis; prefixes and suffixes;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Arabic has a large number of affixes that can modify a stem to form words. In automatic speech recognition (ASR) this leads to a high out-of-vocabulary (OOV) rate for typical lexicon size, and hence a potential increase in WER. This is even more pronounced for dialects of Arabic where additional affixes are often introduced and the available data is typically sparse. To address this problem we introduce a simple word decomposition algorithm which only requires a text corpus and a predefined list of affixes. Using this algorithm to create the lexicon for Iraqi Arabic ASR results in about 10% relative improvement in word error rate (WER). Also using the union of the segmented and unsegmented vocabularies and interpolating the corresponding language models results in further WER reduction. The net WER improvement is about 13% relative.
引用
收藏
页码:277 / 280
页数:4
相关论文
共 18 条
  • [1] AFIFY M, 2005, P EUROSPEECH 05 LISB
  • [2] [Anonymous], 2005, P INT CIT
  • [3] BAHL LR, 1994, P ICASSP 94 AD AUSTR
  • [4] BERTON A, 1996, P ICSLP 96
  • [5] CHOUEITER G, 2006, P ICASSP 06 TOL FRAN
  • [6] GAO Y, 2006, P ICASSP 06 TOUL FRA
  • [7] GEUTNER P, 1995, P INT C AC SPEECH SI, P445
  • [8] GHAOUI A, 2005, P EUROSPEECH 05 LISB
  • [9] GOPALAKRISHNAN PS, 1995, P ICASSP 95 DETR MIC
  • [10] GOPINATH RA, P ICASSP 98 SEATTL U