Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition

被引:34
|
作者
Kim, Myungjong [1 ]
Kim, Younggwan [2 ]
Yoo, Joohong [2 ]
Wang, Jun [1 ]
Kim, Hoirin [2 ]
机构
[1] Univ Texas Dallas, Dept Bioengn, Richardson, TX 75080 USA
[2] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon 305701, South Korea
基金
美国国家卫生研究院; 新加坡国家研究基金会;
关键词
Dysarthria; speech recognition; speaker adaptation; KL-HMM; regularization; KULLBACK-LEIBLER DIVERGENCE; ACOUSTIC MODEL;
D O I
10.1109/TNSRE.2017.2681691
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
This paper addresses the problem of recognizing the speech uttered by patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. Patients with dysarthria have articulatory limitation, and therefore, they often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Modern automatic speech recognition systems designed for regular speakers are ineffective for dysarthric sufferers due to the phonetic variation. To capture the phonetic variation, Kullback-Leibler divergence-based hidden Markov model (KL-HMM) is adopted, where the emission probability of state is parameterized by a categorical distribution using phoneme posterior probabilities obtained from a deep neural network-based acoustic model. To further reflect speaker-specific phonetic variation patterns, a speaker adaptation method based on a combination of L2 regularization and confusion-reducing regularization, which can enhance discriminability between categorical distributions of the KL-HMM states while preserving speaker-specific information is proposed. Evaluation of the proposed speaker adaptation method on a database of several hundred words for 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 non-dysarthric control speakers showed that the proposed approach significantly outperformed the conventional deep neural network-based speaker adapted system on dysarthric as well as non-dysarthric speech.
引用
收藏
页码:1581 / 1591
页数:11
相关论文
共 50 条
  • [31] XMLLR for Improved Speaker Adaptation in Speech Recognition
    Povey, Daniel
    Kuo, Hong-Kwang J.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1705 - +
  • [32] Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
    Tamura, M., 1600, John Wiley and Sons Inc. (35):
  • [33] Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
    Gao, Weixun
    Cao, Qiying
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1149 - 1166
  • [34] Generalized Adaptation to Dysarthric Speech
    Borrie, Stephanie A.
    Lansford, Kaitlin L.
    Barrett, Tyson S.
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2017, 60 (11): : 3110 - 3117
  • [35] KL-divergence Regularized Deep Neural Network Adaptation for Low-resource Speaker-dependent Speech Enhancement
    Chai, Li
    Du, Jun
    Lee, Chin-Hui
    INTERSPEECH 2019, 2019, : 1806 - 1810
  • [36] ON MODELING CONTEXT-DEPENDENT CLUSTERED STATES: COMPARING HMM/GMM, HYBRID HMM/ANN AND KL-HMM APPROACHES
    Razavi, Marzieh
    Rasipuram, Ramya
    Magimai-Doss, Mathew
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [37] TWO-STEP ACOUSTIC MODEL ADAPTATION FOR DYSARTHRIC SPEECH RECOGNITION
    Takashima, Ryoichi
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6104 - 6108
  • [38] Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation
    Bhat, Chitralekha
    Vachhani, Bhavik
    Kopparapu, Sunil
    Speech and Computer, 2016, 9811 : 370 - 377
  • [39] HMM-Based Speaker Emotional Recognition Technology for Speech Signal
    Qin, Yuqiang
    Zhang, Xueying
    FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY, PTS 1-3, 2011, 230-232 : 261 - 265
  • [40] HMM-separation-based speech recognition for a distant moving speaker
    Takiguchi, T
    Nakamura, S
    Shikano, K
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02): : 127 - 140