High resolution speech feature parametrization for monophone-based stressed speech recognition

被引:43
|
作者
Sarikaya, R [1 ]
Hansen, JHL [1 ]
机构
[1] Univ Colorado, Ctr Spoken Language Res, Robust Speech Proc Lab, Boulder, CO 80309 USA
关键词
feature extraction; speech recognition; speech under stress; wavelet analysis;
D O I
10.1109/97.847363
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This letter investigates the impact of stress on monophone speech recognition accuracy and proposes a new set of acoustic parameters based on high resolution wavelet analysis. The two parameter schemes are entitled wavelet packet parameters (WPP) and subband-based cepstral parameters (SBC). The performance of these features is compared to traditional Mel-frequency cepstral coefficients (MFCC) for stressed speech monophone recognition. The stressed speaking styles considered areneutral, angry, loud, and Lombard effect(1) speech from the SUSAS database. An overall monophone recognition improvement of 20.4% and 17.2% is achieved for loud and angry stressed speech, with a corresponding increase in the neutral monophone rate of 9.9% over MFCC parameters.
引用
收藏
页码:182 / 185
页数:4
相关论文
共 50 条
  • [31] A Neural Network Based Nonlinear Feature Transformation for Speech Recognition
    Hu, Hongbing
    Zahorian, Stephen A.
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1533 - +
  • [32] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
  • [33] An Auditory Based Modulation Spectral Feature for Reverberant Speech Recognition
    Maganti, HariKrishna
    Matassoni, Marco
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 570 - 573
  • [34] Emotional Speech Recognition Based on Syllable Distribution Feature Extraction
    Zhang, Haiying
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISKE 2011), 2011, 122 : 415 - 420
  • [35] Speech Recognition in High Noise Environment
    Tang, Chunling
    Li, Min
    EKOLOJI, 2019, 28 (107): : 1561 - 1565
  • [36] Stressed speech recognition using a warped frequency scale
    Gharavian, D.
    Ahadi, S. M.
    IEICE ELECTRONICS EXPRESS, 2008, 5 (06) : 187 - 191
  • [37] Multi-class SVM for stressed speech recognition
    Besbes, Salsabil
    Lachiri, Lied
    2016 2ND INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP), 2016, : 782 - 787
  • [38] Speech emotion recognition based on multimodal and multiscale feature fusion
    Hu, Huangshui
    Wei, Jie
    Sun, Hongyu
    Wang, Chuhang
    Tao, Shuo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [39] A feature-based hierarchical speech recognition system for Hindi
    Samudravijaya, K
    Ahuja, R
    Bondale, N
    Jose, T
    Krishnan, S
    Poddar, P
    Rao, PVS
    Raveendran, R
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1998, 23 (4): : 313 - 340
  • [40] English speech emotion recognition method based on speech recognition
    Liu, Man
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (2) : 391 - 398