Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引:0
|
作者
Sai, B. Tarun [1 ]
Yadav, Ishwar Chandra [1 ]
Shahnawazuddin, S. [1 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India
来源
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年
关键词
Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.
引用
收藏
页码:242 / 246
页数:5
相关论文
共 50 条
  • [31] Improving speech recognition robustness using non-standard windows
    Rozman, R
    Kodek, DM
    IEEE REGION 8 EUROCON 2003, VOL B, PROCEEDINGS: COMPUTER AS A TOOL, 2003, : 171 - 174
  • [32] Towards robustness to speech rate in mandarin all-syllable recognition
    Chen, YN
    Zhu, X
    Liu, J
    Liu, RS
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (06) : 756 - 761
  • [33] LINE SPECTRAL FREQUENCY REPRESENTATION OF SUBBANDS FOR SPEECH RECOGNITION
    ERZIN, E
    CETIN, AE
    SIGNAL PROCESSING, 1995, 44 (01) : 117 - 119
  • [34] Line spectral frequency representation of subbands for speech recognition
    Erzin, E., 1600, Elsevier Science B.V., Amsterdam, Netherlands (44):
  • [35] Filtering the time sequences of spectral parameters for speech recognition
    Nadeu, C
    Paches-Leal, P
    Juang, BH
    SPEECH COMMUNICATION, 1997, 22 (04) : 315 - 332
  • [36] Spectral difference for statistical model-based speech enhancement in speech recognition
    Lee, Soojeong
    Chang, Joon-Hyuk
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (23) : 24917 - 24929
  • [37] Spectral difference for statistical model-based speech enhancement in speech recognition
    Soojeong Lee
    Joon-Hyuk Chang
    Multimedia Tools and Applications, 2017, 76 : 24917 - 24929
  • [38] Emotional speech recognition based on modified parameter and distance of statistical model of pitch
    Department of Radio Engineering, Southeast University, Nanjing 210096, China
    Shengxue Xuebao, 2006, 1 (28-34):
  • [39] Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum
    Alam, Md Jahangir
    Kenny, Patrick
    O'Shaughnessy, Douglas
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1358 - 1361
  • [40] Research of Embedded Speech Recognition System
    Zhu Xuelai
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 - 4, 2010, : 1481 - 1484