EduSpeak®: A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications

被引：48

作者：

Franco, Horacio

Bratt, Harry

Rossier, Romain

Gadde, Venkata Rao

Shriberg, Elizabeth

Abrash, Victor

Precoda, Kristin

机构：

[1] SRI International, Menlo Park, CA 94025-3493

来源：

LANGUAGE TESTING | 2010年 / 27卷 / 03期

关键词：

automatic pronunciation scoring; computer aided language learning; mispronunciation detection;

D O I：

10.1177/0265532210364408

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

SRI International's EduSpeak (R) system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to specific production problems. We review our approach to pronunciation scoring, where our aim is to estimate the grade that a human expert would assign to the pronunciation quality of a paragraph or a phrase. Using databases of nonnative speech and corresponding human ratings at the sentence level, we evaluate different machine scores that can be used as predictor variables to estimate pronunciation quality. For more specific feedback on pronunciation, the EduSpeak toolkit supports a phone-level mispronunciation detection functionality that automatically flags specific phone segments that have been mispronounced. Phone-level information makes it possible to provide the student with feedback about specific pronunciation mistakes. Two approaches to mispronunciation detection were evaluated in a phonetically transcribed database of 130,000 phones uttered in continuous speech sentences by 206 nonnative speakers. Results show that classification error of the best system, for the phones that can be reliably transcribed, is only slightly higher than the average pairwise disagreement between the human transcribers.

引用

页码：401 / 418

页数：18

共 23 条

[1]

[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946

[2]

BERNSTEIN J, 1990, P INT C SPOK LANG PR, P1185

[3]

Bernstein J., 2000, P INTEGRATING SPEECH, P57

[4]

Bialystok E., 1994, In other words: The science and psychology of secondlanguage acquisition

[5]

Bratt H., 1998, P INT C SPOK LANG PR, P1539

[6] Speaker adaptation using combined transformation and Bayesian methods [J].

Digalakis, VV ;

Neumeyer, LG .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04) :294-300

[7] Genones: Generalized mixture tying in continuous hidden Markov model-based speech recognizers [J].

Digalakis, VV ;

Monaco, P ;

Murveit, H .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1996, 4 (04) :281-289

[8] Combination of machine scores for automatic grading of pronunciation quality [J].

Franco, H ;

Neumeyer, L ;

Digalakis, V ;

Ronen, O .

SPEECH COMMUNICATION, 2000, 30 (2-3) :121-130

[9]

FRANCO H, 1997, P INT C AC SPEECH SI, P1471

[10]

Franco H, 2000, P INSTILL2000 INT SP, V2000, P102

← 1 2 3 →