Spectrum enhancement with sparse coding for robust speech recognition

被引:11
|
作者
He, Yongjun [1 ]
Sun, Guanglu [1 ]
Han, Jiqing [2 ]
机构
[1] Harbin Univ Sci & Technol, Harbin 150080, Peoples R China
[2] Harbin Inst Technol, Harbin 150001, Peoples R China
基金
中国国家自然科学基金;
关键词
Sparse coding; Speech denoising; Residual noise; Basis pursuit denoising; JOINT COMPENSATION; REPRESENTATION; NOISE; ADAPTATION; REGRESSION; EQUATIONS; FEATURES; SYSTEMS;
D O I
10.1016/j.dsp.2015.04.014
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, a trend in speech recognition is to introduce sparse coding for noise robustness. Although several methods have been proposed, the performance of sparse coding in speech denoising is not so optimistic. One assumption with sparse coding is that the representation of speech over the speech dictionary is sparse, while that of the noise is dense. This assumption is obviously not sustained in the speech denoising scenario. Many noises are also sparse over the speech dictionary. In such a condition, the representation of noisy speech still contains noise components, resulting in degraded performance. To solve this problem, we first analyze the assumption of sparse coding and then propose a novel method to enhance speech spectrum. This method first finds out the atoms which represent the noise sparsely, and then selectively ignores them in the reconstruction of speech to reduce the residual noise. Speech features are then extracted from the enhanced spectrum for speech recognition. Experimental results show that the proposed method can improve the noise robustness of a speech recognition system substantially. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 50 条
  • [31] Bridging the Gap: Integrating Pre-trained Speech Enhancement and Recognition Models for Robust Speech Recognition
    Wang, Kuan-Chen
    Li, You-Jin
    Chen, Wei-Lun
    Chen, Yu-Wen
    Wang, Yi-Ching
    Yeh, Ping-Cheng
    Zhang, Chao
    Tsao, Yu
    32ND EUROPEAN SIGNAL PROCESSING CONFERENCE, EUSIPCO 2024, 2024, : 426 - 430
  • [32] Supervised sparse patch coding towards misalignment-robust face recognition
    Lang, Congyan
    Feng, Songhe
    Chen, Bin
    Yuan, Xiaotong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2013, 24 (02) : 103 - 110
  • [33] Sparse coding based features for speech units classification
    Sharma, Pulkit
    Abrol, Vinayak
    Dileep, A. D.
    Sao, Anil Kumar
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 333 - 350
  • [34] Weighted sparse coding regularized nonconvex matrix regression for robust face recognition
    Zhang, Hengmin
    Yang, Jian
    Xie, Jianchun
    Qian, Jianjun
    Zhang, Bob
    INFORMATION SCIENCES, 2017, 394 : 1 - 17
  • [35] Obtaining full regularization paths for robust sparse coding with applications to face recognition
    Chorowski, Jan
    Zurada, Jacek
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 356 - 361
  • [36] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [37] Cochannel Speech Segregation with Sparse Coding
    Ingale, Pallavi P.
    Nalbalwar, S. L.
    2016 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, AND OPTIMIZATION TECHNIQUES (ICEEOT), 2016, : 4589 - 4592
  • [38] Comparing Front-End Enhancement Techniques and Multiconditioned Training for Robust Automatic Speech Recognition
    Soni, Meet H.
    Joshi, Sonal
    Panda, Ashish
    TEXT, SPEECH, AND DIALOGUE (TSD 2019), 2019, 11697 : 329 - 340
  • [39] Robust supervised sparse representation for face recognition
    Mi, Jian-Xun
    Sun, Yueru
    Lu, Jia
    Kong, Heng
    COGNITIVE SYSTEMS RESEARCH, 2020, 62 : 10 - 22
  • [40] Temporal Envelope Subtraction for Robust Speech Recognition Using Modulation Spectrum
    Ganapathy, Sriram
    Thomas, Samuel
    Hermansky, Hynek
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 164 - 169