A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

被引:0
作者
Wang, Syu-Siang [1 ]
Hung, Jeih-Weih [2 ]
Tsao, Yu [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan
[2] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
来源
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING | 2012年
关键词
discrete wavelet transform; CMS; CMVN; RASTA; noise robust; speech recognition; SPEECH; NOISE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a cepstral subband normalization (CSN) approach for robust speech recognition. The CSN approach first applies the discrete wavelet transform (DWT) to decompose the original cepstral feature sequence into low and high frequency band (LFB and HFB) parts. Then, CSN normalizes the LFB components and zeros out the HFB components. Finally, an inverse DWT is applied on LFB and HFB components to form the normalized cepstral features. When using the Haar functions as the DWT bases, the calculation of CSN can be processed efficiently with a 50% reduction on the amount of feature components. In addition, our experimental results on the Aurora-2 task show that CSN outperforms the conventional cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), and histogram equalization (HEQ). We also integrate CSN with advanced front-end (AFE) for feature extraction. Experimental results indicate that the integrated AFE+CSN achieves notable improvements over the original AFE. The simple calculation, compact in form, and effective noise robustness properties enable CSN to perform suitably for mobile applications.
引用
收藏
页码:141 / 145
页数:5
相关论文
共 50 条
  • [1] Overlapped sub-band modulation spectrum normalization techniques for robust speech recognition
    Fan, Hao-teng
    Yeh, Wei-jeih
    Hung, Jeih-weih
    2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, : 1035 - 1039
  • [2] GAMMATONE SUB-BAND MAGNITUDE-DOMAIN DEREVERBERATION FOR ASR
    Kumar, Kshitiz
    Singh, Rita
    Raj, Bhiksha
    Stern, Richard
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4604 - 4607
  • [3] Intra-frame cepstral sub-band weighting and histogram equalization for noise-robust speech recognition
    Hung J.-W.
    Fan H.-T.
    Hung, Jeih-weih (jwhung@ncnu.edu.tw), 1600, Springer International Publishing (2013):
  • [4] Sub-band parametric cepstral distance measurement of voiceless alveolar fricative segments as a tool for identifying speaker-characteristic information robust to emotional variation
    Keith, Emma
    Kinoshita, Yuko
    INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2024, 31 (02) : 267 - 290
  • [5] Sub-band weighted projection measure for sub-band speech recognition in noise
    Nasersharif, B.
    Akbari, A.
    ELECTRONICS LETTERS, 2006, 42 (14) : 829 - 831
  • [6] WAVELET SUB-BAND BASED TEMPORAL FEATURES FOR ROBUST HINDI PHONEME RECOGNITION
    Farooq, O.
    Datta, S.
    Shrotriya, M. C.
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2010, 8 (06) : 847 - 859
  • [7] Sub-band Feature Statistics Compensation Techniques Based on Discrete Wavelet Transform for Robust Speech Recognition
    Fan, Hao-Teng
    Hung, Jeih-weih
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 586 - 589
  • [8] An Improved Robust Statistical Voice Activity Detection based on Sub-band Periodic Intensity
    He, Weijun
    Feng, Xiaohui
    Zhu, Zhengyu
    Zhou, Weili
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 2171 - 2175
  • [9] Cepstral amplitude range normalization for noise robust speech recognition
    Yoshizawa, S
    Hayasaka, N
    Wada, N
    Miyanaga, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2004, E87D (08): : 2130 - 2137
  • [10] Spectral Moment Features Augmented by Low Order Cepstral Coefficients for Robust ASR
    Tsiakoulis, Pirros
    Potamianos, Alexandros
    Dimitriadis, Dimitrios
    IEEE SIGNAL PROCESSING LETTERS, 2010, 17 (06) : 551 - 554