Using Deep Speech Recognition to Evaluate Speech Enhancement Methods

被引:2
|
作者
Siddiqui, Shamoon [1 ]
Rasool, Ghulam [1 ]
Ramachandran, Ravi P. [1 ]
Bouaynaya, Nidhal C. [1 ]
机构
[1] Rowan Univ, Dept Elect & Comp Engn, Glassboro, NJ 08028 USA
关键词
speech enhancement; distribution shift; signal-to-noise; benchmark; NOISE;
D O I
10.1109/ijcnn48605.2020.9206817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Progress in speech-related tasks is dependent on the quality of the speech signal being processed. While much progress has been made in various aspects of speech processing (including but not limited to, speech recognition, language detection, and speaker diarization), enhancing a noise-corrupted speech signal as it relates to those tasks has not been rigorously evaluated. Speech enhancement aims to improve the signal-to-noise ratio of a noise-corrupted signal to boost the speech elements (signal) and reduce the non-speech ones (noise). Speech enhancement techniques are evaluated using metrics that are either subjective (asking people their opinion of the enhanced signal) or objective (attempt to calculate metrics based on the signal itself). The subjective measures are better indicators of improved quality but do not scale well to large datasets. The objective metrics have mostly been constructed to attempt to model the subjective results. Our goal in this work is to establish a benchmark to assess the improvement of speech enhancement as it relates to the downstream task of automated speech recognition. In doing so, we retain the qualities of subjective measures while ensuring that evaluation can be done at a large scale in an automated fashion. We explore the impact of various noise types, including stationary, non-stationary, and a shift in noise distribution. We found that existing objective metrics are not a strong indicator of performance as it relates to an improvement in a downstream task. As such, we believe that Word Error Rate should be used when the downstream task is automated speech recognition.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition
    Tu, Yan-Hui
    Du, Jun
    Lee, Chin-Hui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2080 - 2091
  • [22] Classical and Deep Learning Methods for Speech Command Recognition
    Xie, Jie
    Li, Qijing
    Hu, Kai
    Zhu, Mingying
    2021 IEEE 9TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2021), 2021, : 41 - 45
  • [23] Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music
    Khonglah, Banriskhem K.
    Dey, Abhishek
    Prasanna, S. R. Mahadeva
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (02) : 643 - 663
  • [24] Robust speech recognition using singular value decomposition based speech enhancement
    Lilly, BT
    Paliwal, KK
    IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 257 - 260
  • [25] Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition
    Abolhassani, Amin Haji
    Selouani, Sid-Ahmed
    O'Shaughnessy, Douglas
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 19 - +
  • [26] Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions
    Zhou, Hengshun
    Du, Jun
    Tu, Yan-Hui
    Lee, Chin-Hui
    INTERSPEECH 2020, 2020, : 4098 - 4102
  • [27] Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition
    Agrawal, Vikas
    Kumar, Shashi
    Rath, Shakti P.
    INTERSPEECH 2021, 2021, : 2706 - 2710
  • [28] Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music
    Banriskhem K. Khonglah
    Abhishek Dey
    S. R. Mahadeva Prasanna
    Circuits, Systems, and Signal Processing, 2019, 38 : 643 - 663
  • [29] EXEMPLAR-BASED SPEECH ENHANCEMENT FOR DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION
    Baby, Deepak
    Gemmeke, Jort F.
    Virtanen, Tuomas
    Van hamme, Hugo
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4485 - 4489
  • [30] Korean speech recognition using deep learning
    Lee, Suji
    Han, Seokjin
    Park, Sewon
    Lee, Kyeongwon
    Lee, Jaeyong
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (02) : 213 - 227