Convolutional Neural Network Bottleneck Features for bi-directional Generalized Variable Parameter HMMs

被引:0
作者
Su, Rongfeng [1 ,2 ]
Liu, Xunying [1 ,2 ]
Wang, Lan [1 ,2 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Guangdong Prov Key Lab Robot & Intelligent Syst, Beijing, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
来源
2016 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA) | 2016年
基金
中国国家自然科学基金;
关键词
generalized variable parameter HMM; convolutional neural network; bottleneck features; robust speech recognition; SPEECH; FRAMEWORK;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, convolutional neural networks (CNNs) have been applied successfully to acoustic modelling in speech recognition. As the bottleneck features from CNNs contain inherently discriminative and rich context information, the standard approach is to augment the conventional acoustic features with the CNN bottleneck features in a tandem framework. To better capture the highly complex relationship between them, a novel bidirectional generalized variable parameter HMM (GVP-HMM) based approach is proposed in this paper. In this approach, the trajectories of continuous acoustic features space HMM parameters, as well as the model space linear transforms against CNN bottleneck features are modelled by polynomial functions. The optimal GVP-HMM model structure for each direction, which is determined by the locally varying polynomial parameters and degrees, can be automatically learnt using model selection techniques. The proposed bi-directional GVP-HMM based approach gave a word error rate of 12.22% on the Aurora 4 task. In particular, a significant error rate reduction of 18.09% relative was obtained over the baseline tandem HMM system using CNN bottleneck features on the secondary microphone channel condition.
引用
收藏
页码:1126 / 1131
页数:6
相关论文
共 36 条
  • [1] Abdel-Hamid O., 2012, P IEEE ICASSP
  • [2] Abdel-Hamid O, 2013, INTERSPEECH, P3365
  • [3] Convolutional Neural Networks for Speech Recognition
    Abdel-Hamid, Ossama
    Mohamed, Abdel-Rahman
    Jiang, Hui
    Deng, Li
    Penn, Gerald
    Yu, Dong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
  • [4] [Anonymous], 2011, P INT C FLOR IT 27 3
  • [5] [Anonymous], 2009, ICML
  • [6] [Anonymous], 2003, HDB BRAIN THEORY NEU
  • [7] SOLUTION OF VANDERMONDE SYSTEMS OF EQUATIONS
    BJORCK, A
    PEREYRA, V
    [J]. MATHEMATICS OF COMPUTATION, 1970, 24 (112) : 893 - &
  • [8] Bourlard H.A., 1993, Connectionist Speech Recognition: A Hybrid Approach, DOI 10.1007/978-1-4615-3210-1
  • [9] Cheng N., 2011, P ISCA INTERSPEECH F, P482
  • [10] A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression
    Cheng Ning
    Liu XunYing
    Wang Lan
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2011, 54 (12) : 2481 - 2491