Bilinear map of filter-bank outputs for DNN-based speech recognition

被引：0

作者：

Ogawa, Tetsuji ^{[1
]}

Ueda, Kenshiro ^{[1
]}

Katsurada, Kouichi ^{[2
]}

Kobayashi, Tetsunori ^{[1
]}

Nitta, Tsuneo ^{[1
,2
]}

机构：

[1] Waseda Univ, Dept Comp Sci, Tokyo, Japan

[2] Toyohashi Univ Technol, Toyohashi, Aichi, Japan

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

bilinear map; tensor; feature extraction; deep neural network; speech recognition;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.

引用

页码：16 / 20

页数：5

共 16 条

[1] Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2] [Anonymous], 2009, ADV NEURAL INFORM PR
[3] ATHINEOS M, 2004, P ICSLP, P1154
[4] Temporal envelope compensation for robust phoneme recognition using modulation spectrum
Ganapathy, Sriram
Thomas, Samuel
Hermansky, Hynek
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (06) : 3769 - 3780
[5] Hermansky H., 1998, P ICSLP1998 NOV
[6] Deep Neural Networks for Acoustic Modeling in Speech Recognition
Hinton, Geoffrey
Deng, Li
Yu, Dong
Dahl, George E.
Mohamed, Abdel-rahman
Jaitly, Navdeep
Senior, Andrew
Vanhoucke, Vincent
Patrick Nguyen
Sainath, Tara N.
Kingsbury, Brian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
[7] A fast learning algorithm for deep belief nets
Hinton, Geoffrey E.
Osindero, Simon
Teh, Yee-Whye
[J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
[8] HUTCHINSON B, 2012, 2012 IEEE INT C AC, P4805
[9] Tensor Deep Stacking Networks
Hutchinson, Brian
Deng, Li
Yu, Dong
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1944 - 1957
[10] SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS
LEE, KF
HON, HW
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11): : 1641 - 1648

← 1 2 →