Enhancing Pitch Robustness of Speech Recognition System through Spectral Smoothing

被引：0

作者：

Sai, B. Tarun ^{[1
]}

Yadav, Ishwar Chandra ^{[1
]}

Shahnawazuddin, S. ^{[1
]}

Pradhan, Gayadhar ^{[1
]}

机构：

[1] Natl Inst Technol Patna, Dept Elect & Commun Engn, Patna, Bihar, India

来源：

2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM 2018) | 2018年

关键词：

Speech recognition; pitch mismatch; spectral smoothing; modified EMD; CHILDRENS SPEECH; DECOMPOSITION;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we present a novel approach for front-end speech parameterization that is more robust towards pitch variations than the most commonly used technique. Earlier works have shown that, insufficient smoothing of magnitude spectrum leads to pitch-induced distortions. This, in turn, results in poor performance of speech recognition system especially for high-pitched child speakers. To overcome this shortcoming, the short-time magnitude spectrum is first decomposed into several components using a modified version of empirical mode decomposition (EMD). Next, the lowest-order component is discarded and the spectrum is reconstructed using the rest of the higher-order modes for sufficiently smoothing the spectrum. The Mel-frequency cepstral coefficients (MFCC) are then extracted using the smoothed spectra. The signal domain analyses presented in this paper demonstrate that the ill-effects of pitch variations get significantly reduced by the inclusion of proposed spectral smoothing module. In order to statistically validate the same, an automatic speech recognition system is developed using speech data from adult speakers. To simulate large pitch differences, evaluations are performed on a test set which consists of speech data from child speakers. Inclusion of proposed spectral smoothing module leads to a relative improvement of 12% over the baseline system employing acoustic modeling based on deep neural network.

引用

页码：242 / 246

页数：5

共 50 条

[1] Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing
Yadav, Ishwar Chandra
Shahnawazuddin, S.
Pradhan, Gayadhar
DIGITAL SIGNAL PROCESSING, 2019, 86 : 55 - 64
[2] SPECTRAL SMOOTHING BY VARIATIONAL MODE DECOMPOSITION AND ITS EFFECT ON NOISE AND PITCH ROBUSTNESS OF ASR SYSTEM
Yadav, Ishwar Chandra
Shahnawazuddin, S.
Govind, D.
Pradhan, Gayadhar
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5629 - 5633
[3] Enhancing robustness for speech recognition through bio-inspired auditory filter-bank
Maganti, Hari Krishna
Matassoni, Marco
INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2012, 4 (05) : 271 - 277
[4] Exploring the Role of Spectral Smoothing in context of Children's Speech Recognition
Ghai, Shweta
Sinha, Rohit
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1571 - 1574
[5] Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition
Yadav, Ishwar Chandra
Kumar, Avinash
Shahnawazuddin, S.
Pradhan, Gayadhar
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1601 - 1605
[6] Robustness Analysis of Automatic Speech Signal Recognition System Against Factors Degrading Speech Signal
Oska, Jaroslaw
Wojtun, Jaroslaw
Wodecki, Krzysztof
Piotrowski, Zbigniew
SPA 2015 SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS, 2015, : 71 - 75
[7] Pseudo pitch synchronous analysis of speech with applications to speaker recognition
Zilca, RD
Kingsbury, B
Navrátil, J
Ramaswamy, GN
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 467 - 478
[8] FUZZY SMOOTHING OF HMM PARAMETERS IN SPEECH RECOGNITION
KOO, JM
UN, CK
ELECTRONICS LETTERS, 1990, 26 (11) : 743 - 744
[9] Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition
Sun, Yanqing
Zhou, Yu
Zhao, Qingwei
Zhang, Pengyuan
Pan, Fuping
Yan, Yonghong
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) : 2431 - 2439
[10] DELETED SMOOTHING OF HMM PARAMETERS IN SPEECH RECOGNITION
KIM, NS
UN, CK
ELECTRONICS LETTERS, 1993, 29 (09) : 735 - 736

← 1 2 3 4 5 →