VARIABLE-ACTIVATION AND VARIABLE-INPUT DEEP NEURAL NETWORK FOR ROBUST SPEECH RECOGNITION

被引:0
作者
Zhao, Rui [1 ]
Li, Jinyu [2 ]
Gong, Yifan [2 ]
机构
[1] Microsoft Search Technol Ctr Asia, Beijing, Peoples R China
[2] Microsoft Corp, Redmond, WA 98052 USA
来源
2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014 | 2014年
关键词
deep neural network; variable component; variable input; variable activation; robust speech recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a previous study, we proposed variable-component deep neural network (VCDNN) to improve the robustness of context-dependent deep neural network hidden Markov model (CD-DNN-HMM). We model the components of DNN a set of polynomial functions of environmental variables, more specifically signal-to-noise ratio (SNR). We refined VCDNN on two types of DNN components: (1) weighting matrix and bias (2) the output of each layer. These two methods are called variable-parameter DNN (VPDNN) and variable-output DNN (VODNN). Although both methods got good gain over the standard DNN, they doubled the number of parameters even with only the first-order environment variable. In this study, we propose two new types of VCDNN, namely variable activation DNN (VADNN) and variable input DNN (VIDNN). The environment variable is applied to the hidden layer activation function in VADNN, and is applied directly to the input in VIDNN. Both DNNs only increase a negligible number of parameters compared to the standard DNN. Experimental results on Aurora4 task show that both methods are effective, and VIDNN can beat all other variations of VCDNN with relative 7.69% word error reduction from the standard DNN with the least increase in number of parameters.
引用
收藏
页码:542 / 547
页数:6
相关论文
共 20 条
  • [1] [Anonymous], 2002, TECH REP
  • [2] [Anonymous], P ICASSP
  • [3] [Anonymous], 2010, P NIPS WORKSH DEEP L
  • [4] [Anonymous], 2013, P INT C LEARN REPR
  • [5] [Anonymous], 2012, P INTERSPEECH
  • [6] Bo L, 2013, INT CONF ACOUST SPEE, P7408, DOI 10.1109/ICASSP.2013.6639102
  • [7] A study of variable-parameter Gaussian mixture hidden Markov modeling for noisy speech recognition
    Cui, Xiaodong
    Gong, Yifan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (04): : 1366 - 1376
  • [8] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [9] Delcroix M., 2013, Proceedings of Interspeech, P2992
  • [10] Grézl F, 2007, INT CONF ACOUST SPEE, P757