The Joint Optimization of Spectro-Temporal Features and Neural Net Classifiers

被引：0

作者：

Kovacs, Gyoergy ^{[1
]}

Toth, Laszlo ^{[2
]}

机构：

[1] Univ Szeged, Dept Informat, Szeged, Hungary

[2] Hungarian Acad Sci, Res Grp Artificial Intelligence, Szeged, Hungary

来源：

TEXT, SPEECH, AND DIALOGUE, TSD 2013 | 2013年 / 8082卷

关键词：

spectro-temporal features; Neural Net; phone recognition; TIMIT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In speech recognition, spectro-temporal feature extraction and the training of the acoustical model are usually performed separately. To improve recognition performance, we present a combined model which allows the training of the feature extraction filters along with a neural net classifier. Besides expecting that this joint training will result in a better recognition performance, we also expect that such a neural net can generate coefficients for spectro-temporal filters and also enhance preexisting ones, such as those obtained with the two-dimensional Discrete Cosine Transform (2D DCT) and Gabor filters. We tested these assumptions on the TIMIT phone recognition task. The results show that while the initialization based on the 2D DCT or Gabor coefficients is better in some cases than with simple random initialization, the joint model in practice always outperforms the standard two-step method. Furthermore, the results can be significantly improved by using a convolutional version of the network.

引用

页码：552 / 559

页数：8

共 17 条

[1] Abdel-Hamid O, 2012, INT CONF ACOUST SPEE, P4277, DOI 10.1109/ICASSP.2012.6288864
[2] THE SPECTRO-TEMPORAL RECEPTIVE-FIELD - A FUNCTIONAL CHARACTERISTIC OF AUDITORY NEURONS
AERTSEN, AMHJ
JOHANNESMA, PIM
[J]. BIOLOGICAL CYBERNETICS, 1981, 42 (02) : 133 - 143
[3] [Anonymous], 2003, 8 EUROPEAN C SPEECH
[4] [Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach
[5] Localized spectro-temporal cepstral analysis of speech
Bouvrie, Jake
Ezzat, Tony
Poggio, Tomaso
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4733 - 4736
[6] Ezzat T., 2007, INTERSPEECH-2007, P506
[7] GD K.M., 2002, ICSLP-2002, P25
[8] Greenberg S., 1996, P ESCA WORKSHOP AUDI, P1
[9] Extreme learning machines: a survey
Huang, Guang-Bin
Wang, Dian Hui
Lan, Yuan
[J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2011, 2 (02) : 107 - 122
[10] Kleinschmidt M, 2002, ACTA ACUST UNITED AC, V88, P416

← 1 2 →