Stereo-based histogram equalization for robust speech recognition

被引:0
|
作者
Randa Al-Wakeel
Mahmoud Shoman
Magdy Aboul-Ela
Sherif Abdou
机构
[1] Sadat Academy for management Science and Information Systems,Faculty of Computers and Information, Information Technology Department
[2] Cairo University,undefined
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2015卷
关键词
Robust speech recognition; Speech feature normalization; Histogram equalization; Speech enhancement;
D O I
暂无
中图分类号
学科分类号
摘要
Optimal automatic speech recognition (ASR) takes place when the recognition system is tested under circumstances identical to those in which it was trained. However, in the actual real world, there exist many sources of mismatches between the environment of training and the environment of testing. These sources can be due to the sources of noise that exist in real environments. Speech enhancement techniques have been developed to provide ASR systems with the robustness against the sources of noise. In this work, a method based on histogram equalization (HEQ) was proposed to compensate for the nonlinear distortions in speech representation. This approach utilizes stereo simultaneous recordings for clean speech and its corresponding noisy speech to compute stereo Gaussian mixture model (GMM). The stereo GMM is used to compute the cumulative density function (CDF) for both clean speech and noisy speech using a sigmoid function instead of using the order statistics that is used in other HEQ-based methods. In the implementation, we show two choices to apply HEQ, hard decision HEQ and soft decision HEQ. The latter is based on minimum mean square error (MMSE) clean speech estimation. The experimental work shows that the soft HEQ and hard HEQ achieve better recognition results than the other HEQ approaches such as tabular HEQ, quantile HEQ and polynomial fit HEQ. It also shows that soft HEQ achieves notably better recognition results than hard HEQ. The results of the experimental work also show that using HEQ improves the efficiency of other speech enhancement techniques such as stereo piece-wise linear compensation for environment (SPLICE) and vector Taylor series (VTS). The results also show that using HEQ in multi style training (MST) significantly improves the ASR system performance.
引用
收藏
相关论文
共 50 条
  • [1] Stereo-based histogram equalization for robust speech recognition
    Al-Wakeel, Randa
    Shoman, Mahmoud
    Aboul-Ela, Magdy
    Abdou, Sherif
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,
  • [2] Stereo-based stochastic mapping for robust speech recognition
    Afify, Mohamed
    Cui, Xiaodong
    Gao, Yuqing
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 377 - +
  • [3] Stereo-Based Stochastic Mapping for Robust Speech Recognition
    Afify, Mohamed
    Cui, Xiaodong
    Gao, Yuqing
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1325 - 1334
  • [4] Histogram equalization of speech representation for robust speech recognition
    de la Torre, A
    Peinado, AM
    Segura, JC
    Pérez-Córdoba, JL
    Benítez, MC
    Rubio, AJ
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 355 - 366
  • [5] Class-based histogram equalization for robust speech recognition
    Suh, Youngjoo
    Kim, Hoirin
    ETRI JOURNAL, 2006, 28 (04) : 502 - 505
  • [6] SYNTHESIZED STEREO-BASED STOCHASTIC MAPPING WITH DATA SELECTION FOR ROBUST SPEECH RECOGNITION
    Du, Jun
    Huo, Qiang
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 122 - 125
  • [7] STEREO-BASED STOCHASTIC MAPPING WITH DISCRIMINATIVE TRAINING FOR NOISE ROBUST SPEECH RECOGNITION
    Cui, Xiaodong
    Afify, Mohamed
    Gao, Yuqing
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3933 - +
  • [8] Histogram Equalization to Model Adaptation for Robust Speech Recognition
    Suh, Youngjoo
    Kim, Hoirin
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2010,
  • [9] HISTOGRAM EQUALIZATION AND NOISE MASKING FOR ROBUST SPEECH RECOGNITION
    Zhang, Xueru
    Demuynck, Kris
    Van Hamme, Hugo
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4578 - 4581
  • [10] Probabilistic class histogram equalization for robust speech recognition
    Suh, Youngjoo
    Ji, Mikyong
    Kim, Hoirin
    IEEE SIGNAL PROCESSING LETTERS, 2007, 14 (04) : 287 - 290