Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach

被引：11

作者：

AbuZeina, Dia ^{[1
]}

Al-Khatib, Wasfi ^{[1
]}

Elshafei, Moustafa ^{[1
]}

Al-Muhtaseb, Husni ^{[1
]}

机构：

[1] King Fahd Univ Petr & Minerals, Dhahran, Saudi Arabia

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2012年 / 15卷 / 02期

关键词：

Speech recognition; Pronunciation variation; Data-driven approach; Language model; Modern standard Arabic;

D O I：

10.1007/s10772-011-9122-4

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.

引用

页码：65 / 75

页数：11

共 29 条

[1]

AbuZeina D., 2011, INT J SPEECH TECHNOL

[2]

AHMED ME, 1991, ARAB J SCI ENG, V16, P565

[3]

Al-Haj H., 2009, ASRU 2009 IEEE WORKS

[4]

Alghamdi M., 2004, J KING SAUD U COMPUT, V16, P1, DOI DOI 10.1016/S1319-1578(04)80006-5

[5] Arabic broadcast news transcription system [J].

Alghamdi, Mansour ;

Elshafei, Moustafa ;

Al-Muhtaseb, Husni .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2007, 10 (04) :183-195

[6]

Ali M, 2009, J INF TECHNOL RES, V2, P67, DOI 10.4018/jilr.2009062905

[7]

Alsuwaiyel MH, 2003, ALGORITHMS DESIGN TE

[8]

Amdal I., 2003, TELEKTRONIK, V99

[9]

[Anonymous], 2011, IPA FOR ARABIC

[10] Automatic speech recognition and speech variability: A review [J].

Benzeghiba, M. ;

De Mori, R. ;

Deroo, O. ;

Dupont, S. ;

Erbes, T. ;

Jouvet, D. ;

Fissore, L. ;

Laface, P. ;

Mertins, A. ;

Ris, C. ;

Rose, R. ;

Tyagi, V. ;

Wellekens, C. .

SPEECH COMMUNICATION, 2007, 49 (10-11) :763-786

← 1 2 3 →