Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引:0
|
作者
Sai, B. Tarun [1 ]
Yadav, Ishwar Chandra [1 ]
Shahnawazuddin, S. [1 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India
来源
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年
关键词
Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.
引用
收藏
页码:242 / 246
页数:5
相关论文
共 50 条
  • [21] Amazigh digits through interactive speech recognition system in noisy environment
    Mohamed Hamidi
    Hassan Satori
    Ouissam Zealouk
    Khalid Satori
    International Journal of Speech Technology, 2020, 23 : 101 - 109
  • [22] Amazigh digits through interactive speech recognition system in noisy environment
    Hamidi, Mohamed
    Satori, Hassan
    Zealouk, Ouissam
    Satori, Khalid
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2020, 23 (01) : 101 - 109
  • [23] Binary Spectral Masking for Speech Recognition Systems
    Siqueira Versiani, Thiago de Souza
    Rodrigues, Gustavo Fernandes
    Silva de Souza, Ana Claudia
    Moreira, Jussara de Matos
    Yehia, Hani Camille
    2012 35TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2012, : 432 - 436
  • [24] A Pitch-Based Spectral Enhancement Technique for Robust Speech Processing
    Kaewtip, Kantapon
    Tan, Lee Ngee
    Alwan, Abeer
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3283 - 3287
  • [25] Speech Disorder Malay Speech Recognition System
    Al-Haddad, S. A. R.
    SENSORS, SIGNALS, VISUALIZATION, IMAGING, SIMULATION AND MATERIALS, 2009, : 69 - 75
  • [26] Use of spectral autocorrelation in spectral envelope linear prediction for speech recognition
    Kim, HK
    Lee, HS
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (05): : 533 - 541
  • [27] Sindhi Speech Recognition System
    Khoso, Fida Hussain
    Hakro, Dil Nawaz
    Nasir, Syed Zafar
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (11): : 21 - 28
  • [28] Bring the Noise: Introducing Noise Robustness to Pretrained Automatic Speech Recognition
    Eickhoff, Patrick
    Moeller, Matthias
    Rosin, Theresa Pekarek
    Twiefel, Johannes
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 376 - 388
  • [29] Towards robustness to speech rate in mandarin all-syllable recognition
    YiNing Chen
    Xuan Zhu
    Jia Liu
    RunSheng Liu
    Journal of Computer Science and Technology, 2003, 18 : 756 - 761
  • [30] Speech Recognition Features: Comparison Studies on Robustness Against Environmental Distortions
    Abka, Achmad F.
    Pardede, Hilman F.
    2015 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2015, : 114 - 119