Regularized Speaker Adaptation of KL-HMM for Dysarthric Speech Recognition

被引:34
|
作者
Kim, Myungjong [1 ]
Kim, Younggwan [2 ]
Yoo, Joohong [2 ]
Wang, Jun [1 ]
Kim, Hoirin [2 ]
机构
[1] Univ Texas Dallas, Dept Bioengn, Richardson, TX 75080 USA
[2] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon 305701, South Korea
基金
美国国家卫生研究院; 新加坡国家研究基金会;
关键词
Dysarthria; speech recognition; speaker adaptation; KL-HMM; regularization; KULLBACK-LEIBLER DIVERGENCE; ACOUSTIC MODEL;
D O I
10.1109/TNSRE.2017.2681691
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
This paper addresses the problem of recognizing the speech uttered by patients with dysarthria, which is a motor speech disorder impeding the physical production of speech. Patients with dysarthria have articulatory limitation, and therefore, they often have trouble in pronouncing certain sounds, resulting in undesirable phonetic variation. Modern automatic speech recognition systems designed for regular speakers are ineffective for dysarthric sufferers due to the phonetic variation. To capture the phonetic variation, Kullback-Leibler divergence-based hidden Markov model (KL-HMM) is adopted, where the emission probability of state is parameterized by a categorical distribution using phoneme posterior probabilities obtained from a deep neural network-based acoustic model. To further reflect speaker-specific phonetic variation patterns, a speaker adaptation method based on a combination of L2 regularization and confusion-reducing regularization, which can enhance discriminability between categorical distributions of the KL-HMM states while preserving speaker-specific information is proposed. Evaluation of the proposed speaker adaptation method on a database of several hundred words for 30 speakers consisting of 12 mildly dysarthric, 8 moderately dysarthric, and 10 non-dysarthric control speakers showed that the proposed approach significantly outperformed the conventional deep neural network-based speaker adapted system on dysarthric as well as non-dysarthric speech.
引用
收藏
页码:1581 / 1591
页数:11
相关论文
共 50 条
  • [41] Speech recognition for a distant moving speaker based on HMM composition and separation
    Takiguchi, T
    Nakamura, S
    Shikano, K
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1403 - 1406
  • [42] A speaker clustering algorithm for fast speaker adaptation in continuous speech recognition
    Rodríguez, LJ
    Torres, MI
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 433 - 440
  • [43] Speaker adaptation by modeling the speaker variation in a continuous speech recognition system
    Strom, N
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 989 - 992
  • [44] Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination
    Cerva, Petr
    Nouza, Jan
    Silovsky, Jan
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2326 - 2329
  • [45] Optimization of dysarthric speech recognition
    Chen, FX
    Kostov, A
    PROCEEDINGS OF THE 19TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 19, PTS 1-6: MAGNIFICENT MILESTONES AND EMERGING OPPORTUNITIES IN MEDICAL ENGINEERING, 1997, 19 : 1436 - 1439
  • [46] Very low bit rate speech coding based on HMM with speaker adaptation
    Masuko, Takashi
    Kobayashi, Takao
    Tokuda, Keiichi
    Systems and Computers in Japan, 2006, 37 (02): : 67 - 78
  • [47] An On-line Speaker Adaptation Method for HMM-based Speech Recognizers
    Banhalmi, Andras
    Kocsor, Andras
    ACTA CYBERNETICA, 2008, 18 (03): : 379 - 390
  • [48] Nearest Neighbor Approach in Speaker Adaptation for HMM-based Speech Synthesis
    Mohammadi, Amir
    Demiroglu, Cenk
    2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [49] CROSS-LINGUAL SPEAKER ADAPTATION FOR HMM-BASED SPEECH SYNTHESIS
    Wu, Yi-Jian
    King, Simon
    Tokuda, Keiichi
    2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 9 - 12
  • [50] Channel and speaker adaptation techniques for robust speech recognition
    Chen, Jingdong
    Yao, Lei
    Huang, Taiyi
    Shengxue Xuebao/Acta Acustica, 1998, 23 (06): : 537 - 544