Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition

被引：13

作者：

Messaoud, Zaineb ^{[1
]}

Hamida, Ahmed ^{[1
]}

机构：

[1] Sfax Univ, ATMS LETI, ENIS, Technol Informat & Elect Med, Sfax, Tunisia

来源：

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY | 2011年 / 14卷 / 04期

关键词：

Automatic speech recognition (ASR); Phone recognition; Hidden Markov Model (HMM); Variable Order LPC Coding algorithm; Formants feature; Acoustic feature; HLDA;

D O I：

10.1007/s10772-011-9119-z

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Combination of multiple acoustic features has great potential to improve Automatic Speech Recognition (ASR) accuracy. Our contribution in this research was to investigate one novel method when using voiced formants' features in combination with standard MFCC features in order to enhance TIMIT phone recognition. These voiced features provide accurate formants frequencies using a Variable Order LPC Coding (VO-LPC) algorithm that was combined with continuity constraints. The overall estimating formants were concatenated with MFCC features when a voiced frame could be detected. For feature-level combination, Heteroscedastic Linear Discriminant Analysis (HLDA) based approach had been used successfully to find an optimal linear combination of successive vectors of a single feature stream. A series of experiments on phone recognition speakerindependent continuous-speech had been carried out using a subset of the large read-speech TIMIT phone corpus. Hidden Markov Model Toolkit (HTK) was also used throughout all carried experiments. Using such feature level combination, optimized mixture splitting and a bigram language model, a detailed analysis on this combination performance was discussed for Context-Independent (CI) and ContextDependent (CD) Hidden Markov Models (HMM). Experimental results from our proposed procedure showed that phone error rate was successfully decreased by about 3%. At phonetic level group, an increase of 8% and of 10% was observed respectively for vowel and liquid group. These results proved clear phone enhancement regarding existing state of the art.

引用

页码：393 / 403

页数：11

共 36 条

[1] [Anonymous], 2002, HTK BOOK
[2] SPEECH ANALYSIS AND SYNTHESIS BY LINEAR PREDICTION OF SPEECH WAVE
ATAL, BS
HANAUER, SL
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 50 (02) : 637 - +
[3] PATTERN-RECOGNITION APPROACH TO VOICED UNVOICED SILENCE CLASSIFICATION WITH APPLICATIONS TO SPEECH RECOGNITION
ATAL, BS
RABINER, LR
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1976, 24 (03): : 201 - 212
[4] Ben Messaoud Z., 2009, INT J SIGNAL PROCESS, P291
[5] CDHMM Parameters Selection for Speaker-Independent Phone Recognition In Continuous Speech System
Ben Messaoud, Zaineb
Ben Hamida, Ahmed
[J]. MELECON 2010: THE 15TH IEEE MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, 2010, : 253 - 258
[6] Automatic speech recognition and speech variability: A review
Benzeghiba, M.
De Mori, R.
Deroo, O.
Dupont, S.
Erbes, T.
Jouvet, D.
Fissore, L.
Laface, P.
Mertins, A.
Ris, C.
Rose, R.
Tyagi, V.
Wellekens, C.
[J]. SPEECH COMMUNICATION, 2007, 49 (10-11) : 763 - 786
[7] COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES
DAVIS, SB
MERMELSTEIN, P
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04): : 357 - 366
[8] Augmenting standard speech recognition features with energy gravity centres
De Mori, R
Moisa, L
Gemello, R
Mana, F
Albesano, D
[J]. COMPUTER SPEECH AND LANGUAGE, 2001, 15 (04) : 341 - 354
[9] METHODS OF MEASURING VOWEL FORMANT BANDWIDTHS
DUNN, HK
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1961, 33 (12) : 1737 - &
[10] A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
Fiscus, JG
[J]. 1997 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, PROCEEDINGS, 1997, : 347 - 354

← 1 2 3 4 →