A PROGRESSIVE LEARNING APPROACH TO ADAPTIVE NOISE AND SPEECH ESTIMATION FOR SPEECH ENHANCEMENT AND NOISY SPEECH RECOGNITION

被引:9
|
作者
Nian, Zhaoxu [1 ]
Tu, Yan-Hui [1 ]
Du, Jun [1 ]
Lee, Chin-Hui [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
基金
国家重点研发计划;
关键词
Speech recognition; speech enhancement; progressive learning; improved minima controlled recursive averaging; adaptive noise and speech estimation;
D O I
10.1109/ICASSP39728.2021.9413395
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose a progressive learning-based adaptive noise and speech estimation (PL-ANSE) method for speech preprocessing in noisy speech recognition, leveraging upon a frame-level noise tracking capability of improved minima controlled recursive averaging (IMCRA) and an utterance-level deep progressive learning of nonlinear interactions between speech and noise. First, a bi-directional long short-term memory model is adopted at each network layer to learn progressive ratio masks (PRMs) as targets with progressively increasing signal-to-noise ratios. Then, the estimated PRMs at the utterance level are combined within a conventional speech enhancement algorithm at the frame level for speech enhancement. Finally, the enhanced speech based on multi-level information fusion is directly fed into a speech recognition system to improve the recognition performance. Experiments show that our proposed approach can achieve a relative word error rate (WER) reduction of 22.1% when compared to results attained with unprocessed noisy speech (from 23.84% to 18.57%) on the CHiME-4 single-channel real test data.
引用
收藏
页码:6913 / 6917
页数:5
相关论文
共 50 条
  • [1] SPEECH RECOGNITION WITH NO SPEECH OR WITH NOISY SPEECH
    Krishna, Gautam
    Co Tran
    Yu, Jianguo
    Tewfik, Ahmed H.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1090 - 1094
  • [2] A PROGRESSIVE ENHANCEMENT METHOD FOR NOISY AND REVERBERANT SPEECH
    Shu, Xiaofeng
    Zhou, Yi
    Cao, Yin
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [3] Selective Acoustic Feature Enhancement for Speech Emotion Recognition With Noisy Speech
    Leem, Seong-Gyun
    Fulford, Daniel
    Onnela, Jukka-Pekka
    Gard, David
    Busso, Carlos
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 917 - 929
  • [4] Auditory driven subband speech enhancement for automatic recognition of noisy speech
    Upadhyay N.
    Rosales H.G.
    International Journal of Speech Technology, 2016, 19 (4) : 869 - 880
  • [5] A filter constructed from estimation of clean speech and noise for speech enhancement in speech recognition systems
    Meng Sha
    Qin Shenghao
    Liu Jia
    2006 IMACS: MULTICONFERENCE ON COMPUTATIONAL ENGINEERING IN SYSTEMS APPLICATIONS, VOLS 1 AND 2, 2006, : 1620 - +
  • [6] Speech Enhancement Based on Masking Approach Considering Speech Quality and Acoustic Confidence for Noisy Speech Recognition
    Chu, Shih-Chuan
    Wu, Chung-Hsien
    Lin, Yun-Wen
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 536 - 540
  • [7] IMPROVING SPEECH RECOGNITION ON NOISY SPEECH VIA SPEECH ENHANCEMENT WITH MULTI-DISCRIMINATORS CYCLEGAN
    Li, Chia-Yu
    Ngoc Thang Vu
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 830 - 836
  • [8] Advancing Speech Recognition With No Speech Or With Noisy Speech
    Krishna, Gautam
    Tran, Co
    Carnahan, Mason
    Tewfik, Ahmed
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [9] Noise variance speech estimation for Kalman filtering of noisy speech
    Kim, W
    Ko, HS
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2001, E84D (01) : 155 - 160
  • [10] Adaptive Speech Enhancement for Speech Separation in Diffuse Noise
    Hu, Rong
    Zhao, Yunxin
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2618 - 2621