Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model

被引:40
作者
Kim, Myung Jong [1 ]
Kim, Younggwan [1 ]
Kim, Hoirin [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Taejon 305701, South Korea
基金
新加坡国家研究基金会;
关键词
Dysarthria; pronunciation confusion network; speech intelligibility assessment; structured sparse model; weighted finite state transducer (WFST); RECOGNITION; DISORDERS; SPEAKERS; ERRORS;
D O I
10.1109/TASLP.2015.2403619
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. The proposed method consists of two main steps: feature representation and prediction. In the feature representation step, the speech utterance is converted into a phone sequence using an automatic speech recognition technique and is then aligned with a canonical phone sequence from a pronunciation dictionary using a weighted finite state transducer to capture the pronunciation mappings such as match, substitution, and deletion. The histograms of the pronunciation mappings on a pre-defined word set are used for features. Next, in the prediction step, a structured sparse linear model incorporated with phonological knowledge that simultaneously addresses phonologically structured sparse feature selection and intelligibility prediction is proposed. Evaluation of the proposed method on a database of 109 speakers consisting of 94 dysarthric and 15 control speakers yielded a root mean square error of 8.14 compared to subjectively rated scores in the range of 0 to 100. This is a promising performance in which the system can be successfully applied to help speech therapists in diagnosing the degree of speech disorder.
引用
收藏
页码:694 / 704
页数:11
相关论文
共 47 条
[21]   Structured Sparsity Models for Reverberant Speech Separation [J].
Asaei, Afsaneh ;
Golbabaee, Mohammad ;
Bourlard, Herve ;
Cevher, Volkan .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (03) :620-633
[22]   Structured Sparsity through Convex Optimization [J].
Bach, Francis ;
Jenatton, Rodolphe ;
Mairal, Julien ;
Obozinski, Guillaume .
STATISTICAL SCIENCE, 2012, 27 (04) :450-468
[23]   Automatic Intelligibility Assessment of Speakers After Laryngeal Cancer by Means of Acoustic Modeling [J].
Bocklet, Tobias ;
Riedhammer, Korbinian ;
Noeth, Elmar ;
Eysholdt, Ulrich ;
Haderlein, Tino .
JOURNAL OF VOICE, 2012, 26 (03) :390-397
[24]   Identification of Articulation Error Patterns Using a Novel Dependence Network [J].
Chen, Yeou-Jiunn .
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2011, 58 (11) :3061-3068
[25]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[26]  
Duffy J. R., 2005, Motor Speech Disorders: Substrates, Differential Diagnosis, and Management
[27]   Disordered speech assessment using automatic methods based on quantitative measures [J].
Gu, LY ;
Harris, JG ;
Shrivastav, R ;
Sapienza, C .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (09) :1400-1409
[28]   Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers [J].
Kayasith, Prakasith ;
Theeramunkong, Thanaruk .
KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 27 (03) :367-391
[29]   Frequency of consonant articulation errors in dysarthric speech [J].
Kim, Heejin ;
Martin, Katie ;
Hasegawa-Johnson, Mark ;
Perlman, Adrienne .
CLINICAL LINGUISTICS & PHONETICS, 2010, 24 (10) :759-770
[30]  
Kim M. J., 2007, Human Brain Research Consulting