Convolutional Neural Network Bottleneck Features for bi-directional Generalized Variable Parameter HMMs

被引:0
作者
Su, Rongfeng [1 ,2 ]
Liu, Xunying [1 ,2 ]
Wang, Lan [1 ,2 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Prov Key Lab Robot & Intelligent Syst, Beijing, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
来源
2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA) | 2016年
基金
中国国家自然科学基金;
关键词
generalized variable parameter HMM; convolutional neural network; bottleneck features; robust speech recognition; SPEECH; FRAMEWORK;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, convolutional neural networks (CNNs) have been applied successfully to acoustic modelling in speech recognition. As the bottleneck features from CNNs contain inherently discriminative and rich context information, the standard approach is to augment the conventional acoustic features with the CNN bottleneck features in a tandem framework. To better capture the highly complex relationship between them, a novel bidirectional generalized variable parameter HMM (GVP-HMM) based approach is proposed in this paper. In this approach, the trajectories of continuous acoustic features space HMM parameters, as well as the model space linear transforms against CNN bottleneck features are modelled by polynomial functions. The optimal GVP-HMM model structure for each direction, which is determined by the locally varying polynomial parameters and degrees, can be automatically learnt using model selection techniques. The proposed bi-directional GVP-HMM based approach gave a word error rate of 12.22% on the Aurora 4 task. In particular, a significant error rate reduction of 18.09% relative was obtained over the baseline tandem HMM system using CNN bottleneck features on the secondary microphone channel condition.
引用
收藏
页码:1126 / 1131
页数:6
相关论文
共 36 条
  • [31] Automatic Complexity Control of Generalized Variable Parameter HMMs for Noise Robust Speech Recognition
    Su, Rongfeng
    Liu, Xunying
    Wang, Lan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 102 - 114
  • [32] Su RF, 2013, 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P150, DOI 10.1109/ASRU.2013.6707721
  • [33] Xie XR, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P279
  • [34] Yu D, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P244
  • [35] A Novel Framework and Training Algorithm for Variable-Parameter Hidden Markov Models
    Yu, Dong
    Deng, Li
    Gong, Yifan
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1348 - 1360
  • [36] Zhang XJ, 2014, INTERSPEECH, P1386