The Joint Optimization of Spectro-Temporal Features and Neural Net Classifiers

被引:0
作者
Kovacs, Gyoergy [1 ]
Toth, Laszlo [2 ]
机构
[1] Univ Szeged, Dept Informat, Szeged, Hungary
[2] Hungarian Acad Sci, Res Grp Artificial Intelligence, Szeged, Hungary
来源
TEXT, SPEECH, AND DIALOGUE, TSD 2013 | 2013年 / 8082卷
关键词
spectro-temporal features; Neural Net; phone recognition; TIMIT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In speech recognition, spectro-temporal feature extraction and the training of the acoustical model are usually performed separately. To improve recognition performance, we present a combined model which allows the training of the feature extraction filters along with a neural net classifier. Besides expecting that this joint training will result in a better recognition performance, we also expect that such a neural net can generate coefficients for spectro-temporal filters and also enhance preexisting ones, such as those obtained with the two-dimensional Discrete Cosine Transform (2D DCT) and Gabor filters. We tested these assumptions on the TIMIT phone recognition task. The results show that while the initialization based on the 2D DCT or Gabor coefficients is better in some cases than with simple random initialization, the joint model in practice always outperforms the standard two-step method. Furthermore, the results can be significantly improved by using a convolutional version of the network.
引用
收藏
页码:552 / 559
页数:8
相关论文
共 17 条
  • [1] Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
  • [2] THE SPECTRO-TEMPORAL RECEPTIVE-FIELD - A FUNCTIONAL CHARACTERISTIC OF AUDITORY NEURONS
    AERTSEN, AMHJ
    JOHANNESMA, PIM
    [J]. BIOLOGICAL CYBERNETICS, 1981, 42 (02) : 133 - 143
  • [3] [Anonymous], 2003, 8 EUROPEAN C SPEECH
  • [4] [Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach
  • [5] Localized spectro-temporal cepstral analysis of speech
    Bouvrie, Jake
    Ezzat, Tony
    Poggio, Tomaso
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4733 - 4736
  • [6] Ezzat T., 2007, INTERSPEECH-2007, P506
  • [7] GD K.M., 2002, ICSLP-2002, P25
  • [8] Greenberg S., 1996, P ESCA WORKSHOP AUDI, P1
  • [9] Extreme learning machines: a survey
    Huang, Guang-Bin
    Wang, Dian Hui
    Lan, Yuan
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2011, 2 (02) : 107 - 122
  • [10] Kleinschmidt M, 2002, ACTA ACUST UNITED AC, V88, P416