Bilinear map of filter-bank outputs for DNN-based speech recognition

被引:0
作者
Ogawa, Tetsuji [1 ]
Ueda, Kenshiro [1 ]
Katsurada, Kouichi [2 ]
Kobayashi, Tetsunori [1 ]
Nitta, Tsuneo [1 ,2 ]
机构
[1] Waseda Univ, Dept Comp Sci, Tokyo, Japan
[2] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
bilinear map; tensor; feature extraction; deep neural network; speech recognition;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Filter-bank outputs are extended into tensors to yield precise acoustic features for speech recognition using deep neural networks (DNNs). The filter-bank outputs with temporal contexts form a time-frequency pattern of speech and have been shown to be effective as a feature parameter for DNN-based acoustic models. We attempt to project the filter-bank outputs onto a tensor product space using decorrelation followed by a bilinear map to improve acoustic separability in feature extraction. This extension makes extracting a more precise structure of the time-frequency pattern possible because the bilinear map yields higher-order correlations of features. Experimental comparisons carried out in phoneme recognition demonstrate that the tensor feature provides comparable results to the filter-bank feature, and the fusion of the two features yields an improvement over each feature.
引用
收藏
页码:16 / 20
页数:5
相关论文
共 16 条
  • [1] Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
  • [2] [Anonymous], 2009, ADV NEURAL INFORM PR
  • [3] ATHINEOS M, 2004, P ICSLP, P1154
  • [4] Temporal envelope compensation for robust phoneme recognition using modulation spectrum
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 128 (06) : 3769 - 3780
  • [5] Hermansky H., 1998, P ICSLP1998 NOV
  • [6] Deep Neural Networks for Acoustic Modeling in Speech Recognition
    Hinton, Geoffrey
    Deng, Li
    Yu, Dong
    Dahl, George E.
    Mohamed, Abdel-rahman
    Jaitly, Navdeep
    Senior, Andrew
    Vanhoucke, Vincent
    Patrick Nguyen
    Sainath, Tara N.
    Kingsbury, Brian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 82 - 97
  • [7] A fast learning algorithm for deep belief nets
    Hinton, Geoffrey E.
    Osindero, Simon
    Teh, Yee-Whye
    [J]. NEURAL COMPUTATION, 2006, 18 (07) : 1527 - 1554
  • [8] HUTCHINSON B, 2012, 2012 IEEE INT C AC, P4805
  • [9] Tensor Deep Stacking Networks
    Hutchinson, Brian
    Deng, Li
    Yu, Dong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1944 - 1957
  • [10] SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS
    LEE, KF
    HON, HW
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11): : 1641 - 1648