Deep Neural Network based Feature Extraction Using Convex-nonnegative Matrix Factorization for Low-resource Speech Recognition

被引:0
作者
Qin, Chuxiong [1 ]
Zhang, Lianhai [1 ]
机构
[1] Zhengzhou Informat Sci & Technol Inst, Zhengzhou, Peoples R China
来源
2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC) | 2016年
关键词
convex-nonnegative matrix factorization; deep neural network; low-dimensional features; low-resource speech recognition;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Bottleneck feature (BNF), together with Gaussian mixture models, has achieved great success compared with acoustic features in low-resource speech recognition. However, the existing of BN layer decreases classification accuracy of deep neural networks (DNN). In this paper, we investigate a better way of extracting DNN based low-dimensional features using convex-nonnegative matrix factorization (CNMF). Firstly a DNN is trained without setting the BN layer. Secondly CNMF is applied on the weights matrix of a hidden layer to form a low-dimensional feature extraction layer. Finally a new type of high-level feature is extracted by forward passing input acoustic feature. Experiments show that the new feature produces 1.6-4.6% gain over BNF baseline system in English and Czech low-resource tasks. When dropout and maxout are introduced, 3.1-5.6% additional gain over BNF baseline system is observed while the training time reduces.
引用
收藏
页码:1082 / 1086
页数:5
相关论文
共 22 条
  • [1] [Anonymous], 2013, ICML
  • [2] [Anonymous], 2014, ARXIV14016984
  • [3] [Anonymous], 2013, P ICASSP
  • [4] Bao Y., 2013, IEEE INT C AC SPEECH
  • [5] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [6] Convex and Semi-Nonnegative Matrix Factorizations
    Ding, Chris
    Li, Tao
    Jordan, Michael I.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (01) : 45 - 55
  • [7] Grézl F, 2007, INT CONF ACOUST SPEE, P757
  • [8] Hinton G. E., 2012, ABS12070580 CORR
  • [9] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [10] Learning the parts of objects by non-negative matrix factorization
    Lee, DD
    Seung, HS
    [J]. NATURE, 1999, 401 (6755) : 788 - 791