Robust Speech Recognition Using MLP Neural Network in Log-Spectral Domain

被引:2
|
作者
Ghaemmaghami, Masoumeh P. [1 ,2 ]
Sameti, Hossein [3 ]
Razzazi, Farbod [1 ]
BabaAli, Bagher [3 ]
Dabbaghchian, Saeed [3 ]
机构
[1] Islamic Azad Univ, Fac Engn, Dept Elect Engn, Sci & Res Branch, Tehran, Iran
[2] Islamic Azad Univ, Fac Engn, Dept Elect Engn, Young Res Club, Tehran, Iran
[3] Sharif Univ Technol, Dept Comp Engn, Speech Proc Lab, Tehran, Iran
来源
2009 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT 2009) | 2009年
关键词
MLP neural network; log spectral; robust speech recognition;
D O I
10.1109/ISSPIT.2009.5407513
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, we have proposed an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. A Multi Layer Perceptron (MLP) neural network in the log spectral domain has been employed to minimize the difference between noisy and clean speech. By using this method, as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments has been improved. We extended the application of the system to different environments with different noises without retraining HMM model. We trained the feature extraction stage with a small portion of noisy data which was created by artificially adding different types of noises from the NOISEX-92 database to the TIMIT speech database. In real environment, where our speech recognition systems must work, different types of noises with various SNRs exist. Our proposed method suggests four strategies based on the system capability to identify the noise type and SNR. Experimental results show that the proposed method achieves significant improvement in recognition rates.
引用
收藏
页码:467 / +
页数:3
相关论文
共 50 条
  • [1] Accurate compensation in the log-spectral domain for noisy speech recognition
    Afify, M
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 388 - 398
  • [2] Combining log-spectral domain compensation with MVA feature post-processing for robust speech recognition
    Lei, Jianjun
    Wang, Jian
    Guo, Jun
    Liu, Gang
    Shen, Haifeng
    IIH-MSP: 2006 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS, 2006, : 663 - +
  • [3] Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition
    Gonzalez, Jose A.
    Peinado, Antonio M.
    Gomez, Angel M.
    Ma, Ning
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2629 - 2632
  • [4] A NOVEL APPROACH TO SOFT-MASK ESTIMATION AND LOG-SPECTRAL ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
    van Hout, Julien
    Alwan, Abeer
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4105 - 4108
  • [5] MODULATION-DOMAIN SPEECH ENHANCEMENT USING A KALMAN FILTER WITH A BAYESIAN UPDATE OF SPEECH AND NOISE IN THE LOG-SPECTRAL DOMAIN
    Dionelis, Nikolaos
    Brookes, Mike
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 111 - 115
  • [6] Adaptive log-spectral regression for in-car speech recognition using multiple distributed microphones
    Li, WF
    Takeda, K
    Itakura, F
    IEEE SIGNAL PROCESSING LETTERS, 2005, 12 (04) : 340 - 343
  • [7] Improved noise spectra estimation and log-spectral regression for in-car speech recognition
    Li, W. (lee@sp.m.is.nagoya-u.ac.jp), Information Processing Society of Japan, IPSJ; The Database Society of Japan, DBSJ; The IEEE Computer Society; The Inst. of Elec., Info. and Com. Engineers, IEICE (IEEE Computer Society):
  • [8] NOISE ESTIMATION USING A CONSTRAINED SEQUENTIAL HMM IN LOG-SPECTRAL DOMAIN
    Ying, Dongwen
    Lu, Xugang
    Li, Junfeng
    Yan, Yonghong
    Dang, Jianwu
    Soong, Frank
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4553 - 4556
  • [9] Assessment of Disordered Voices Using Empirical Mode Decomposition in the Log-Spectral Domain
    Kacha, A.
    Grenez, F.
    Schoentgen, J.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 66 - 69
  • [10] On Reliability of Log-Spectral Distortion Measure in Speech Quality Estimation
    Prodeus, Arkadiy
    Kotvytskyi, Igor
    2017 IEEE 4TH INTERNATIONAL CONFERENCE ACTUAL PROBLEMS OF UNMANNED AERIAL VEHICLES DEVELOPMENTS (APUAVD), 2017, : 121 - 124