Automatic Intelligibility Assessment of Dysarthric Speech Using Phonologically-Structured Sparse Linear Model

被引:40
作者
Kim, Myung Jong [1 ]
Kim, Younggwan [1 ]
Kim, Hoirin [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Taejon 305701, South Korea
基金
新加坡国家研究基金会;
关键词
Dysarthria; pronunciation confusion network; speech intelligibility assessment; structured sparse model; weighted finite state transducer (WFST); RECOGNITION; DISORDERS; SPEAKERS; ERRORS;
D O I
10.1109/TASLP.2015.2403619
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents a new method for automatically assessing the speech intelligibility of patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. The proposed method consists of two main steps: feature representation and prediction. In the feature representation step, the speech utterance is converted into a phone sequence using an automatic speech recognition technique and is then aligned with a canonical phone sequence from a pronunciation dictionary using a weighted finite state transducer to capture the pronunciation mappings such as match, substitution, and deletion. The histograms of the pronunciation mappings on a pre-defined word set are used for features. Next, in the prediction step, a structured sparse linear model incorporated with phonological knowledge that simultaneously addresses phonologically structured sparse feature selection and intelligibility prediction is proposed. Evaluation of the proposed method on a database of 109 speakers consisting of 94 dysarthric and 15 control speakers yielded a root mean square error of 8.14 compared to subjectively rated scores in the range of 0 to 100. This is a promising performance in which the system can be successfully applied to help speech therapists in diagnosing the degree of speech disorder.
引用
收藏
页码:694 / 704
页数:11
相关论文
共 47 条
[1]  
[Anonymous], P NEUR INF PROC
[2]  
[Anonymous], J KOR SOC SPEECH SCI
[3]  
[Anonymous], P 52 ANN M ASS COMP
[4]  
[Anonymous], P INT C MACH LEARN
[5]  
[Anonymous], MANUAL SPEECH SOUND
[6]  
[Anonymous], P INT C SPOK LANG PR
[7]  
[Anonymous], 2009, Ariz. State Univ.
[8]  
[Anonymous], P INT 12 SEP
[9]  
[Anonymous], P NAACL HLT LOS ANG
[10]  
[Anonymous], P INT 02 DENV CO US