High resolution speech feature parametrization for monophone-based stressed speech recognition

被引：43

作者：

Sarikaya, R ^{[1
]}

Hansen, JHL ^{[1
]}

机构：

[1] Univ Colorado, Ctr Spoken Language Res, Robust Speech Proc Lab, Boulder, CO 80309 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2000年 / 7卷 / 07期

关键词：

feature extraction; speech recognition; speech under stress; wavelet analysis;

D O I：

10.1109/97.847363

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This letter investigates the impact of stress on monophone speech recognition accuracy and proposes a new set of acoustic parameters based on high resolution wavelet analysis. The two parameter schemes are entitled wavelet packet parameters (WPP) and subband-based cepstral parameters (SBC). The performance of these features is compared to traditional Mel-frequency cepstral coefficients (MFCC) for stressed speech monophone recognition. The stressed speaking styles considered areneutral, angry, loud, and Lombard effect(1) speech from the SUSAS database. An overall monophone recognition improvement of 20.4% and 17.2% is achieved for loud and angry stressed speech, with a corresponding increase in the neutral monophone rate of 9.9% over MFCC parameters.

引用

页码：182 / 185

页数：4

共 50 条

[41] English speech emotion recognition method based on speech recognition
Man Liu
International Journal of Speech Technology, 2022, 25 : 391 - 398
[42] Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
Sakai, Makoto
Kitaoka, Norihide
Takeda, Kazuya
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (05) : 1244 - 1252
[43] An application of discriminative feature extraction lo filter-bank-based speech recognition
Biem, A
Katagiri, S
McDermott, E
Juang, BH
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02): : 96 - 110
[44] Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method
Boulal H.
Hamidi M.
Abarkan M.
Barkani J.
International Journal of Speech Technology, 2024, 27 (01) : 287 - 296
[45] A clustering based feature selection method in spectro-temporal domain for speech recognition
Esfandian, Nafiseh
Razzazi, Farbod
Behrad, Alireza
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (06) : 1194 - 1202
[46] Unified Training of Feature Extractor and HMM Classifier for Speech Recognition
Im, Jung-Hui
Lee, Soo-Young
IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (02) : 111 - 114
[47] Feature extraction for HMM speech recognition systems using DTW
Go, J
Hyun, D
Lee, C
6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL III, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING I, 2002, : 241 - 244
[48] Introducing Temporal Asymmetries in Feature Extraction for Automatic Speech Recognition
Sivaram, G. S. V. S.
Hermansky, Hynek
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 890 - 893
[49] Gammatone features and feature combination for large vocabulary speech recognition
Schlueter, R.
Bezrukov, I.
Wagner, H.
Ney, H.
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 649 - 652
[50] An auditory neural feature extraction method for robust speech recognition
Guo, Wei
Zhang, Liqing
Xia, Bin
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 793 - +

← 1 2 3 4 5 →