SPEAKER-INDEPENDENT CONSONANT CLASSIFICATION IN CONTINUOUS SPEECH WITH DISTINCTIVE FEATURES AND NEURAL NETWORKS

被引：10

作者：

DEMORI, R ^{[1
]}

FLAMMIA, G ^{[1
]}

机构：

[1] MIT,COMP SCI LAB,SPOKEN LANGUAGE SYST GRP,CAMBRIDGE,MA 02139

来源：

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA | 1993年 / 94卷 / 06期

关键词：

D O I：

10.1121/1.407243

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper provides experimental evidence to the assertion that the design of appropriate neural networks (NN) for speech recognition should be inspired by acoustic and phonetic knowledge, and not only by knowledge in pattern recognition. Rather than investigating the NN learning paradigm, the paper is focused on the influence of the input parameters, of the internal structure, and of the desired output representation on the classification performance of recurrent multilayer perceptrons. As an instructive example, the paper analyzes the problem of classifying ten stop and nasal consonants in continuous speech independently of the speaker. Experiments are reported for the TIMIT database, using 343 speakers in the training set and 77 different speakers in the test set. Comparative experiments show that good performance is obtained when many input acoustic parameters are used, including a time/frequency gradient operator related to transitions of the second formant, and when the desired outputs represent context-dependent articulatory features. Classification is performed by principal component analysis of the NN outputs. Refinement of the design parameters yield increasingly better performance on the test set, ranging from 45% errors for a perceptron without hidden nodes to 23.3% errors for the best NN.

引用

页码：3091 / 3103

页数：13

共 57 条

[1]

[Anonymous], 1987, LEARNING INTERNAL RE

[2]

BARTKOVA K, 1991, 12TH P INT C PHON SC, V4, P474

[3] GLOBAL OPTIMIZATION OF A NEURAL NETWORK-HIDDEN MARKOV MODEL HYBRID [J].

BENGIO, Y ;

DEMORI, R ;

FLAMMIA, G ;

KOMPE, R .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (02) :252-259

[4] PHONETICALLY MOTIVATED ACOUSTIC PARAMETERS FOR CONTINUOUS SPEECH RECOGNITION USING ARTIFICIAL NEURAL NETWORKS [J].

BENGIO, Y ;

DEMORI, R ;

FLAMMIA, G ;

KOMPE, R .

SPEECH COMMUNICATION, 1992, 11 (2-3) :261-271

[5]

Bengio Y., 1990, ADV NEURAL INFORMATI, VII, P218

[6]

BIMBOT F, 1990, P INT C SPOKEN LANGU, P665

[7] PERCEPTUAL INVARIANCE AND ONSET SPECTRA FOR STOP CONSONANTS IN DIFFERENT VOWEL ENVIRONMENTS [J].

BLUMSTEIN, SE ;

STEVENS, KN .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1980, 67 (02) :648-662

[8] ACOUSTIC INVARIANCE IN SPEECH PRODUCTION - EVIDENCE FROM MEASUREMENTS OF THE SPECTRAL CHARACTERISTICS OF STOP CONSONANTS [J].

BLUMSTEIN, SE ;

STEVENS, KN .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 66 (04) :1001-1017

[9]

BOURLARD H, 1989, P ICASSP, P33

[10]

Cox D.R., 1989, ANAL BINARY DATA, V32

← 1 2 3 4 5 6 →