Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling

被引：0

作者：

G. Thimmaraja Yadava

H. S. Jayanna

机构：

[1] Jain Deemed to be University,Department of Electronics and Communication Engineering, School of Engineering and Technology

[2] Siddaganga Institute of Technology,Department of Information Science and Engg

来源：

International Journal of Speech Technology | 2020年 / 23卷

关键词：

Speech; Speech recognition; Interactive voice response system (IVRS); Automatic speech recognition (ASR); Spectral subtraction with voice activity detection (SS-VAD); Minimum mean square error spectrum power estimator based on zero crossing (MMSE-SPZC); Minimum mean square error spectrum power (MMSE-SP); Maximum a Posteriori (MAP);

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In this paper, the improvements in the recently implemented Kannada speech recognition system is demonstrated in detail. The Kannada automatic speech recognition (ASR) system consists of ASR models which are created by using Kaldi, IVRS call flow and weather and agricultural commodity prices information databases. The task specific speech data used in the recently developed spoken dialogue system had high level of different background noises. The different types of noises present in collected speech data had an adverse effect on the on line and off line speech recognition performances. Therefore, to improve the speech recognition accuracy in Kannada ASR system, a noise reduction algorithm is developed which is a fusion of spectral subtraction with voice activity detection (SS-VAD) and minimum mean square error spectrum power estimator based on zero crossing (MMSE-SPZC) estimator. The noise elimination algorithm is added in the system before the feature extraction part. An alternative ASR models are created using subspace Gaussian mixture models (SGMM) and deep neural network (DNN) modeling techniques. The experimental results show that, the fusion of noise elimination technique and SGMM/DNN based modeling gives a better relative improvement of 7.68% accuracy compared to the recently developed GMM-HMM based ASR system. The least word error rate (WER) acoustic models could be used in spoken dialogue system. The developed spoken query system is tested from Karnataka farmers under uncontrolled environment.

引用

页码：149 / 167

页数：18

共 53 条

[1] Ansari Z(2016)Toward growing modular deep neural networks for continuous speech recognition Neural Computing and Applications 28 1177-1196
[2] Seyyedsalehi SA(2002)Noise estimation by minima controlled recursive averaging for robust speech enhancement IEEE Signal Processing Letters 9 12-15
[3] Cohen I(2012)A review of speech recognition with Sphinx engine in language detection Journal of Theoretical and Applied Information Technology 40 147-155
[4] Berdugo B(2006)A fast learning algorithm for deep belief nets Neural Computer 18 1527-1554
[5] Derbali M(2007)Subjective comparison and evaluation of speech enhancement algorithms Speech Communications 49 588-601
[6] Mu’Tasem J(2014)Large vocabulary Russian speech recognition using syntactico-statistical language modeling Speech Communications 56 213-228
[7] Taib M(2017)A study of neural network russian language models for automatic continuous speech recognition systems Automation and Remote Control 78 858-867
[8] Hinton GE(2011)Estimators of the magnitude-squared spectrum and methods for incorporating SNR uncertainty IEEE Transactions on Audio, Speech, and Language processing 19 1123-1137
[9] Osindero S(2017)Speech enhancement based on full-sentence correlation and clean speech recognition IEEE/ACM Transactions on Audio, Speech, and Language Processing 25 531-543
[10] Teh YW(2016)Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition International Journal of Speech Technology 19 495-508

← 1 2 3 4 5 6 →