PRONUNCIATION VARIATION MODELING FOR ARABIC ASRs: A DIRECT DATA-DRIVEN APPROACH

被引：0

作者：

Abuzeina, Dia ^{[1
]}

Elshafei, Moustafa ^{[1
]}

机构：

[1] King Fahd Univ Petr & Minerals, Dhahran, Saudi Arabia

来源：

THIRD INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND TECHNOLOGY (ICCET 2011) | 2011年

关键词：

Speech recognition; pronunciation variation; direct data-driven approach; pronunciation dictionary adaptation; Modern Standard Arabic;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic speech recognition (ASR) systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the phonetic dictionary, leading to a number of out of vocabulary (OOV) word forms. This paper presents a direct data-driven approach to model pronunciation variations, in which the pronunciation variants are distilled from the training speech. The proposed method is based on adding the pronunciation variants to the ASR pronunciation dictionary as well as to the language model. We started with a baseline Arabic speech recognition system using Carnegie Mellon University (CMU) Sphinx3 speech recognition engine. The baseline is based on a 5.4 hour speech corpus of Modern Standard Arabic (MSA) broadcast news, with a phonetic dictionary of 14,234 canonical pronunciations. The baseline system achieves a WER of 13.39%. The proposed method identifies the variations in the phonetic transcription of the spoken words. The phonemic variants of words are then filtered and added with distinctive letter spellings in an expanded phonetic dictionary. In our experiment, 554 variants were added to the basic phonetic dictionary as new words. The artificially added words are used together with their sentences in the language model as well. Our results show that while the expanded dictionary alone did not add appreciable improvements, the WER is reduced by 2.04% when the variants are considered within the language model.

引用

页码：325 / 330

页数：6

共 11 条

[1] AHMED ME, 1991, ARAB J SCI ENG, V16, P565
[2] Alghamdi M., 2004, King Saud University Journal: Computer Sciences and Information, V16, P1
[3] Ali M., 2008, 5 INT C INN INF TECH
[4] Amdal I., 2003, TELEKTRONIKK, V99
[5] Automatic speech recognition and speech variability: A review
Benzeghiba, M.
De Mori, R.
Deroo, O.
Dupont, S.
Erbes, T.
Jouvet, D.
Fissore, L.
Laface, P.
Mertins, A.
Ris, C.
Rose, R.
Tyagi, V.
Wellekens, C.
[J]. SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786
[6] Techniques for high quality Arabic speech synthesis
Elshafei, M
Al-Muhtaseb, H
Al-Ghamdi, M
[J]. INFORMATION SCIENCES, 2002, 140 (3-4) : 255 - 267
[7] Fosler-Lussier E., 1999, Proceedings of the international congress on phonetic sciences, P611
[8] McAllaster D., 1998, P ICSLP, V98, P1847
[9] Pronunciation modeling by sharing Gaussian densities across phonetic models
Saraçlar, M
Nock, H
Khudanpur, S
[J]. COMPUTER SPEECH AND LANGUAGE, 2000, 14 (02) : 137 - 160
[10] Sloboda T, 1996, ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, P2328, DOI 10.1109/ICSLP.1996.607274

← 1 2 →