A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

被引:0
作者
Wang, Syu-Siang [1 ]
Hung, Jeih-Weih [2 ]
Tsao, Yu [1 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 115, Taiwan
[2] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
来源
2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING | 2012年
关键词
discrete wavelet transform; CMS; CMVN; RASTA; noise robust; speech recognition; SPEECH; NOISE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a cepstral subband normalization (CSN) approach for robust speech recognition. The CSN approach first applies the discrete wavelet transform (DWT) to decompose the original cepstral feature sequence into low and high frequency band (LFB and HFB) parts. Then, CSN normalizes the LFB components and zeros out the HFB components. Finally, an inverse DWT is applied on LFB and HFB components to form the normalized cepstral features. When using the Haar functions as the DWT bases, the calculation of CSN can be processed efficiently with a 50% reduction on the amount of feature components. In addition, our experimental results on the Aurora-2 task show that CSN outperforms the conventional cepstral mean subtraction (CMS), cepstral mean and variance normalization (CMVN), and histogram equalization (HEQ). We also integrate CSN with advanced front-end (AFE) for feature extraction. Experimental results indicate that the integrated AFE+CSN achieves notable improvements over the original AFE. The simple calculation, compact in form, and effective noise robustness properties enable CSN to perform suitably for mobile applications.
引用
收藏
页码:141 / 145
页数:5
相关论文
共 50 条
  • [21] Integrating Codebook and Utterance Information in Cepstral Statistics Normalization Techniques for Robust Speech Recognition
    He, Guan-min
    Hung, Jeih-weih
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1231 - 1234
  • [22] Voice-Activity Detection Using Long-Term Sub-Band Entropy Measure
    Wang, Kun-Ching
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2012, E95A (09) : 1606 - 1609
  • [23] Phone recognition in critical bands using sub-band temporal modulations
    Li, Feipeng
    Mallidi, Sri Harish
    Hermansky, Hynek
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1814 - 1817
  • [24] Recognition of Cough Using Features Improved by Sub-band Energy Transformation
    Zhu, Chunmei
    Tian, Lianfang
    Li, Xiangyang
    Mo, Hongqiang
    Zheng, Zeguang
    PROCEEDINGS OF THE 2013 6TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2013), VOLS 1 AND 2, 2013, : 251 - 255
  • [25] Investigation of Sub-Band Discriminative Information between Spoofed and Genuine Speech
    Sriskandaraja, Kaavya
    Sethu, Vidhyasaharan
    Phu Ngoc Le
    Ambikairajah, Eliathamby
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1710 - 1714
  • [26] Mathematical Modeling of Human Emotions Using Sub-band Coefficients of Wavelet Analysis
    Islam, Monira
    Ahmad, Mohiuddin
    Yusuf, Md Salah Uddin
    Ahmed, Tazrin
    2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION COMMUNICATION TECHNOLOGY (ICEEICT 2015), 2015,
  • [27] Nonlinear Features of Bark Wavelet Sub-band Filtering for Pathological Voice Recognition
    Zhang, Xiao-Jun
    Zhu, Xin-Cheng
    Wu, Di
    Xiao, Zhong-Zhe
    Tao, Zhi
    Zhao, He-Ming
    ENGINEERING LETTERS, 2021, 29 (01) : 49 - 60
  • [28] Sub-Band Noise Reduction in Multi-Channel Digital Hearing Aid
    Wang, Qingyun
    Liang, Ruiyu
    Jing, Li
    Zou, Cairong
    Zhao, Li
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (01): : 292 - 295
  • [29] Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features
    Kovacs, Gyoergy
    Toth, Laszlo
    Grosz, Tamas
    SPEECH AND COMPUTER, 2014, 8773 : 386 - 393
  • [30] Characterizing Sub-Band Spectral Entropy Based Acoustics as Assessment of Vocal Correlate of Depression
    Yingthawornsuk, Thaweesak
    Thanawattano, Chusak
    INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2010), 2010, : 1179 - 1183