Speaker Adaptation of Hybrid NN/HMM Model for Speech Recognition Based on Singular Value Decomposition

被引:15
|
作者
Xue, Shaofei [1 ]
Jiang, Hui [2 ]
Dai, Lirong [1 ]
Liu, Qingfeng [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei 230026, Peoples R China
[2] York Univ, Dept Elect Engn & Comp Sci, Toronto, ON M3J 2R7, Canada
关键词
Deep neural network (DNN); Hybrid DNN/HMM; Speaker adaptation; Singular value decomposition (SVD); TRANSFORMATIONS;
D O I
10.1007/s11265-015-1012-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the connection weights in trained DNNs directly to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs and then tune rectangular diagonal matrices with the adaptation data. This alleviates the over-fitting problem via updating the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, recognition results in the Switchboard task have shown that the proposed SVD-based adaptation method may achieve up to 3-6 % relative error reduction using only a few dozens of adaptation utterances per speaker.
引用
收藏
页码:175 / 185
页数:11
相关论文
共 50 条
  • [21] Comparison of Discriminative Input and Output Transformations for Speaker Adaptation in the Hybrid NN/HMM Systems
    Li, Bo
    Sim, Khe Chai
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 526 - 529
  • [22] HMM/NN hybrids for continuous speech recognition
    Alim, OAA
    Elboghdadly, N
    El Shaar, NM
    PROCEEDINGS OF THE EIGHTEENTH NATIONAL RADIO SCIENCE CONFERENCE, VOLS 1 AND 2, 2001, : 509 - 516
  • [23] A NN/HMM hybrid for continuous speech recognition with a discriminant nonlinear feature extraction
    Rigoll, G
    Willett, D
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 9 - 12
  • [24] An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition
    Hariharan, R
    Viikki, O
    SPEECH COMMUNICATION, 2002, 37 (3-4) : 349 - 361
  • [25] Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
    Tamura, M., 1600, John Wiley and Sons Inc. (35):
  • [26] Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
    Gao, Weixun
    Cao, Qiying
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1149 - 1166
  • [27] Speaker adaptation for hybrid MMI/connectionist speech recognition systems
    Rottland, J
    Neukirchen, C
    Rigoll, G
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 465 - 468
  • [28] A new hybrid HMM/ANN model for speech recognition
    Xi, XJ
    Lin, KH
    Zhou, CL
    Cai, J
    Artificial Intelligence Applications and Innovations II, 2005, 187 : 223 - 230
  • [29] Speaker adaptation technique for HMM model
    Kwong, S
    He, QH
    Man, KF
    Tang, KS
    ELECTRONICS LETTERS, 1999, 35 (21) : 1817 - 1818
  • [30] HMM-Based Speaker Emotional Recognition Technology for Speech Signal
    Qin, Yuqiang
    Zhang, Xueying
    FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY, PTS 1-3, 2011, 230-232 : 261 - 265