High resolution speech feature parametrization for monophone-based stressed speech recognition

被引：43

作者：

Sarikaya, R ^{[1
]}

Hansen, JHL ^{[1
]}

机构：

[1] Univ Colorado, Ctr Spoken Language Res, Robust Speech Proc Lab, Boulder, CO 80309 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2000年 / 7卷 / 07期

关键词：

feature extraction; speech recognition; speech under stress; wavelet analysis;

D O I：

10.1109/97.847363

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This letter investigates the impact of stress on monophone speech recognition accuracy and proposes a new set of acoustic parameters based on high resolution wavelet analysis. The two parameter schemes are entitled wavelet packet parameters (WPP) and subband-based cepstral parameters (SBC). The performance of these features is compared to traditional Mel-frequency cepstral coefficients (MFCC) for stressed speech monophone recognition. The stressed speaking styles considered areneutral, angry, loud, and Lombard effect(1) speech from the SUSAS database. An overall monophone recognition improvement of 20.4% and 17.2% is achieved for loud and angry stressed speech, with a corresponding increase in the neutral monophone rate of 9.9% over MFCC parameters.

引用

页码：182 / 185

页数：4

共 50 条

[31] A Neural Network Based Nonlinear Feature Transformation for Speech Recognition
Hu, Hongbing
Zahorian, Stephen A.
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1533 - +
[32] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
Leem, Seong-Gyun
Fulford, Daniel
Onnela, Jukka-Pekka
Gard, David
Busso, Carlos
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
[33] An Auditory Based Modulation Spectral Feature for Reverberant Speech Recognition
Maganti, HariKrishna
Matassoni, Marco
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 570 - 573
[34] Emotional Speech Recognition Based on Syllable Distribution Feature Extraction
Zhang, Haiying
FOUNDATIONS OF INTELLIGENT SYSTEMS (ISKE 2011), 2011, 122 : 415 - 420
[35] Speech Recognition in High Noise Environment
Tang, Chunling
Li, Min
EKOLOJI, 2019, 28 (107): : 1561 - 1565
[36] Stressed speech recognition using a warped frequency scale
Gharavian, D.
Ahadi, S. M.
IEICE ELECTRONICS EXPRESS, 2008, 5 (06) : 187 - 191
[37] Multi-class SVM for stressed speech recognition
Besbes, Salsabil
Lachiri, Lied
2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2016, : 782 - 787
[38] Speech emotion recognition based on multimodal and multiscale feature fusion
Hu, Huangshui
Wei, Jie
Sun, Hongyu
Wang, Chuhang
Tao, Shuo
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
[39] A feature-based hierarchical speech recognition system for Hindi
Samudravijaya, K
Ahuja, R
Bondale, N
Jose, T
Krishnan, S
Poddar, P
Rao, PVS
Raveendran, R
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1998, 23 (4): : 313 - 340
[40] English speech emotion recognition method based on speech recognition
Liu, Man
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398

← 1 2 3 4 5 →