Deep Neural Network based Feature Extraction Using Convex-nonnegative Matrix Factorization for Low-resource Speech Recognition

被引：0

作者：

Qin, Chuxiong ^{[1
]}

Zhang, Lianhai ^{[1
]}

机构：

[1] Zhengzhou Informat Sci & Technol Inst, Zhengzhou, Peoples R China

来源：

2016 IEEE INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC) | 2016年

关键词：

convex-nonnegative matrix factorization; deep neural network; low-dimensional features; low-resource speech recognition;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Bottleneck feature (BNF), together with Gaussian mixture models, has achieved great success compared with acoustic features in low-resource speech recognition. However, the existing of BN layer decreases classification accuracy of deep neural networks (DNN). In this paper, we investigate a better way of extracting DNN based low-dimensional features using convex-nonnegative matrix factorization (CNMF). Firstly a DNN is trained without setting the BN layer. Secondly CNMF is applied on the weights matrix of a hidden layer to form a low-dimensional feature extraction layer. Finally a new type of high-level feature is extracted by forward passing input acoustic feature. Experiments show that the new feature produces 1.6-4.6% gain over BNF baseline system in English and Czech low-resource tasks. When dropout and maxout are introduced, 3.1-5.6% additional gain over BNF baseline system is observed while the training time reduces.

引用

页码：1082 / 1086

页数：5

共 22 条

[1] [Anonymous], 2013, ICML
[2] [Anonymous], 2014, ARXIV14016984
[3] [Anonymous], 2013, P ICASSP
[4] Bao Y., 2013, IEEE INT C AC SPEECH
[5] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Dahl, George E.
Yu, Dong
Deng, Li
Acero, Alex
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
[6] Convex and Semi-Nonnegative Matrix Factorizations
Ding, Chris
Li, Tao
Jordan, Michael I.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (01) : 45 - 55
[7] Grézl F, 2007, INT CONF ACOUST SPEE, P757
[8] Hinton G. E., 2012, ABS12070580 CORR
[9] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[10] Learning the parts of objects by non-negative matrix factorization
Lee, DD
Seung, HS
[J]. NATURE, 1999, 401 (6755) : 788 - 791

← 1 2 3 →