SEQUENCE CLASSIFICATION USING THE HIGH-LEVEL FEATURES EXTRACTED FROM DEEP NEURAL NETWORKS

被引:0
作者
Deng, Li [1 ]
Chen, Jianshu [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
deep neural net; feature extraction; ARMA recurrent neural net; phone recognition; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The recent success of deep neural networks (DNNs) in speech recognition can be attributed largely to their ability to extract a specific form of high-level features from raw acoustic data for subsequent sequence classification or recognition tasks. Among the many possible forms of DNN features, what forms are more useful than others and how effective these DNN features are in connection with the different types of downstream sequence recognizers remained unexplored and are the focus of this paper. We report our recent work on the construction of a diverse set of DNN features, including the vectors extracted from the output layer and from various hidden layers in the DNN. We then apply these features as the inputs to four types of classifiers to carry out the identical sequence classification task of phone recognition. The experimental results show that the features derived from the top hidden layer of the DNN perform the best for all four classifiers, especially for the autoregressive-moving-average (ARMA) version of a recurrent neural network. The feature vector derived from the DNN's output layer performs slightly worse but better than any of the hidden layers in the DNN except the top one.
引用
收藏
页数:5
相关论文
共 35 条
[1]  
[Anonymous], 2011, P ICASSP
[2]  
[Anonymous], P ICASSP
[3]  
[Anonymous], P ICASSP
[4]  
[Anonymous], IEEE T AUDIO SPEECH
[5]  
[Anonymous], P ICASSP
[6]  
[Anonymous], P ICASSP
[7]  
[Anonymous], 2013, ICASSP
[8]  
[Anonymous], P INTERSPEECH
[9]  
[Anonymous], P INTERSPEECH
[10]  
[Anonymous], IEEE T AUDIO SPEECH