Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition

被引：34

作者：

Kim, Myungjong ^{[1
]}

Kim, Younggwan ^{[2
]}

Yoo, Joohong ^{[2
]}

Wang, Jun ^{[1
]}

Kim, Hoirin ^{[2
]}

机构：

[1] Univ Texas Dallas, Dept Bioengn, Richardson, TX 75080 USA

[2] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon 305701, South Korea

来源：

IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING | 2017年 / 25卷 / 09期

基金：

美国国家卫生研究院; 新加坡国家研究基金会;

关键词：

Dysarthria; speech recognition; speaker adaptation; KL-HMM; regularization; KULLBACK-LEIBLER DIVERGENCE; ACOUSTIC MODEL;

D O I：

10.1109/TNSRE.2017.2681691

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

This paper addresses the problem of recognizing the speech uttered by patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. Patients with dysarthria have articulatory limitation, and therefore, they often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Modern automatic speech recognition systems designed for regular speakers are ineffective for dysarthric sufferers due to the phonetic variation. To capture the phonetic variation, Kullback-Leibler divergence-based hidden Markov model (KL-HMM) is adopted, where the emission probability of state is parameterized by a categorical distribution using phoneme posterior probabilities obtained from a deep neural network-based acoustic model. To further reflect speaker-specific phonetic variation patterns, a speaker adaptation method based on a combination of L2 regularization and confusion-reducing regularization, which can enhance discriminability between categorical distributions of the KL-HMM states while preserving speaker-specific information is proposed. Evaluation of the proposed speaker adaptation method on a database of several hundred words for 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 non-dysarthric control speakers showed that the proposed approach significantly outperformed the conventional deep neural network-based speaker adapted system on dysarthric as well as non-dysarthric speech.

引用

页码：1581 / 1591

页数：11

共 50 条

[41] Speech recognition for a distant moving speaker based on HMM composition and separation
Takiguchi, T
Nakamura, S
Shikano, K
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1403 - 1406
[42] A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition
Rodríguez, LJ
Torres, MI
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 433 - 440
[43] Speaker adaptation by modeling the speaker variation in a continuous speech recognition system
Strom, N
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 989 - 992
[44] Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination
Cerva, Petr
Nouza, Jan
Silovsky, Jan
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2326 - 2329
[45] Optimization of dysarthric speech recognition
Chen, FX
Kostov, A
PROCEEDINGS OF THE 19TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 19, PTS 1-6: MAGNIFICENT MILESTONES AND EMERGING OPPORTUNITIES IN MEDICAL ENGINEERING, 1997, 19 : 1436 - 1439
[46] Very low bit rate speech coding based on HMM with speaker adaptation
Masuko, Takashi
Kobayashi, Takao
Tokuda, Keiichi
Systems and Computers in Japan, 2006, 37 (02): : 67 - 78
[47] An On-line Speaker Adaptation Method for HMM-based Speech Recognizers
Banhalmi, Andras
Kocsor, Andras
ACTA CYBERNETICA, 2008, 18 (03): : 379 - 390
[48] Nearest Neighbor Approach in Speaker Adaptation for HMM-based Speech Synthesis
Mohammadi, Amir
Demiroglu, Cenk
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[49] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
Wu, Yi-Jian
King, Simon
Tokuda, Keiichi
2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
[50] Channel and speaker adaptation techniques for robust speech recognition
Chen, Jingdong
Yao, Lei
Huang, Taiyi
Shengxue Xuebao/Acta Acustica, 1998, 23 (06): : 537 - 544

← 1 2 3 4 5 →