Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引:0
|
作者
Sai, B. Tarun [1 ]
Yadav, Ishwar Chandra [1 ]
Shahnawazuddin, S. [1 ]
Pradhan, Gayadhar [1 ]
机构
[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India
来源
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年
关键词
Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.
引用
收藏
页码:242 / 246
页数:5
相关论文
共 50 条
  • [1] Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing
    Yadav, Ishwar Chandra
    Shahnawazuddin, S.
    Pradhan, Gayadhar
    DIGITAL SIGNAL PROCESSING, 2019, 86 : 55 - 64
  • [2] SPECTRAL SMOOTHING BY VARIATIONAL MODE DECOMPOSITION AND ITS EFFECT ON NOISE AND PITCH ROBUSTNESS OF ASR SYSTEM
    Yadav, Ishwar Chandra
    Shahnawazuddin, S.
    Govind, D.
    Pradhan, Gayadhar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5629 - 5633
  • [3] Enhancing robustness for speech recognition through bio-inspired auditory filter-bank
    Maganti, Hari Krishna
    Matassoni, Marco
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2012, 4 (05) : 271 - 277
  • [4] Exploring the Role of Spectral Smoothing in context of Children's Speech Recognition
    Ghai, Shweta
    Sinha, Rohit
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1571 - 1574
  • [5] Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition
    Yadav, Ishwar Chandra
    Kumar, Avinash
    Shahnawazuddin, S.
    Pradhan, Gayadhar
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1601 - 1605
  • [6] Robustness Analysis of Automatic Speech Signal Recognition System Against Factors Degrading Speech Signal
    Oska, Jaroslaw
    Wojtun, Jaroslaw
    Wodecki, Krzysztof
    Piotrowski, Zbigniew
    SPA 2015 SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS, 2015, : 71 - 75
  • [7] Pseudo pitch synchronous analysis of speech with applications to speaker recognition
    Zilca, RD
    Kingsbury, B
    Navrátil, J
    Ramaswamy, GN
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 467 - 478
  • [8] FUZZY SMOOTHING OF HMM PARAMETERS IN SPEECH RECOGNITION
    KOO, JM
    UN, CK
    ELECTRONICS LETTERS, 1990, 26 (11) : 743 - 744
  • [9] Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition
    Sun, Yanqing
    Zhou, Yu
    Zhao, Qingwei
    Zhang, Pengyuan
    Pan, Fuping
    Yan, Yonghong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) : 2431 - 2439
  • [10] DELETED SMOOTHING OF HMM PARAMETERS IN SPEECH RECOGNITION
    KIM, NS
    UN, CK
    ELECTRONICS LETTERS, 1993, 29 (09) : 735 - 736