High resolution speech feature parametrization for monophone-based stressed speech recognition

被引:43
|
作者
Sarikaya, R [1 ]
Hansen, JHL [1 ]
机构
[1] Univ Colorado, Ctr Spoken Language Res, Robust Speech Proc Lab, Boulder, CO 80309 USA
关键词
feature extraction; speech recognition; speech under stress; wavelet analysis;
D O I
10.1109/97.847363
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This letter investigates the impact of stress on monophone speech recognition accuracy and proposes a new set of acoustic parameters based on high resolution wavelet analysis. The two parameter schemes are entitled wavelet packet parameters (WPP) and subband-based cepstral parameters (SBC). The performance of these features is compared to traditional Mel-frequency cepstral coefficients (MFCC) for stressed speech monophone recognition. The stressed speaking styles considered areneutral, angry, loud, and Lombard effect(1) speech from the SUSAS database. An overall monophone recognition improvement of 20.4% and 17.2% is achieved for loud and angry stressed speech, with a corresponding increase in the neutral monophone rate of 9.9% over MFCC parameters.
引用
收藏
页码:182 / 185
页数:4
相关论文
共 50 条
  • [41] English speech emotion recognition method based on speech recognition
    Man Liu
    International Journal of Speech Technology, 2022, 25 : 391 - 398
  • [42] Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition
    Sakai, Makoto
    Kitaoka, Norihide
    Takeda, Kazuya
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (05) : 1244 - 1252
  • [43] An application of discriminative feature extraction lo filter-bank-based speech recognition
    Biem, A
    Katagiri, S
    McDermott, E
    Juang, BH
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (02): : 96 - 110
  • [44] Amazigh CNN speech recognition system based on Mel spectrogram feature extraction method
    Boulal H.
    Hamidi M.
    Abarkan M.
    Barkani J.
    International Journal of Speech Technology, 2024, 27 (01) : 287 - 296
  • [45] A clustering based feature selection method in spectro-temporal domain for speech recognition
    Esfandian, Nafiseh
    Razzazi, Farbod
    Behrad, Alireza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (06) : 1194 - 1202
  • [46] Unified Training of Feature Extractor and HMM Classifier for Speech Recognition
    Im, Jung-Hui
    Lee, Soo-Young
    IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (02) : 111 - 114
  • [47] Feature extraction for HMM speech recognition systems using DTW
    Go, J
    Hyun, D
    Lee, C
    6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL III, PROCEEDINGS: IMAGE, ACOUSTIC, SPEECH AND SIGNAL PROCESSING I, 2002, : 241 - 244
  • [48] Introducing Temporal Asymmetries in Feature Extraction for Automatic Speech Recognition
    Sivaram, G. S. V. S.
    Hermansky, Hynek
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 890 - 893
  • [49] Gammatone features and feature combination for large vocabulary speech recognition
    Schlueter, R.
    Bezrukov, I.
    Wagner, H.
    Ney, H.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 649 - 652
  • [50] An auditory neural feature extraction method for robust speech recognition
    Guo, Wei
    Zhang, Liqing
    Xia, Bin
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 793 - +