Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model

被引：40

作者：

Kim, Myung Jong ^{[1
]}

Kim, Younggwan ^{[1
]}

Kim, Hoirin ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Taejon 305701, South Korea

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2015年 / 23卷 / 04期

基金：

新加坡国家研究基金会;

关键词：

Dysarthria; pronunciation confusion network; speech intelligibility assessment; structured sparse model; weighted finite state transducer (WFST); RECOGNITION; DISORDERS; SPEAKERS; ERRORS;

D O I：

10.1109/TASLP.2015.2403619

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. The proposed method consists of two main steps: feature representation and prediction. In the feature representation step, the speech utterance is converted into a phone sequence using an automatic speech recognition technique and is then aligned with a canonical phone sequence from a pronunciation dictionary using a weighted finite state transducer to capture the pronunciation mappings such as match, substitution, and deletion. The histograms of the pronunciation mappings on a pre-defined word set are used for features. Next, in the prediction step, a structured sparse linear model incorporated with phonological knowledge that simultaneously addresses phonologically structured sparse feature selection and intelligibility prediction is proposed. Evaluation of the proposed method on a database of 109 speakers consisting of 94 dysarthric and 15 control speakers yielded a root mean square error of 8.14 compared to subjectively rated scores in the range of 0 to 100. This is a promising performance in which the system can be successfully applied to help speech therapists in diagnosing the degree of speech disorder.

引用

页码：694 / 704

页数：11

共 47 条

[21] Structured Sparsity Models for Reverberant Speech Separation [J].

Asaei, Afsaneh ;

Golbabaee, Mohammad ;

Bourlard, Herve ;

Cevher, Volkan .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (03) :620-633

[22] Structured Sparsity through Convex Optimization [J].

Bach, Francis ;

Jenatton, Rodolphe ;

Mairal, Julien ;

Obozinski, Guillaume .

STATISTICAL SCIENCE, 2012, 27 (04) :450-468

[23] Automatic Intelligibility Assessment of Speakers After Laryngeal Cancer by Means of Acoustic Modeling [J].

Bocklet, Tobias ;

Riedhammer, Korbinian ;

Noeth, Elmar ;

Eysholdt, Ulrich ;

Haderlein, Tino .

JOURNAL OF VOICE, 2012, 26 (03) :390-397

[24] Identification of Articulation Error Patterns Using a Novel Dependence Network [J].

Chen, Yeou-Jiunn .

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2011, 58 (11) :3061-3068

[25] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[26]

Duffy J. R., 2005, Motor Speech Disorders: Substrates, Differential Diagnosis, and Management

[27] Disordered speech assessment using automatic methods based on quantitative measures [J].

Gu, LY ;

Harris, JG ;

Shrivastav, R ;

Sapienza, C .

EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (09) :1400-1409

[28] Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers [J].

Kayasith, Prakasith ;

Theeramunkong, Thanaruk .

KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 27 (03) :367-391

[29] Frequency of consonant articulation errors in dysarthric speech [J].

Kim, Heejin ;

Martin, Katie ;

Hasegawa-Johnson, Mark ;

Perlman, Adrienne .

CLINICAL LINGUISTICS & PHONETICS, 2010, 24 (10) :759-770

[30]

Kim M. J., 2007, Human Brain Research Consulting

← 1 2 3 4 5 →