Using Deep Speech Recognition to Evaluate Speech Enhancement Methods

被引:2
|
作者
Siddiqui, Shamoon [1 ]
Rasool, Ghulam [1 ]
Ramachandran, Ravi P. [1 ]
Bouaynaya, Nidhal C. [1 ]
机构
[1] Rowan Univ, Dept Elect & Comp Engn, Glassboro, NJ 08028 USA
关键词
speech enhancement; distribution shift; signal-to-noise; benchmark; NOISE;
D O I
10.1109/ijcnn48605.2020.9206817
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Progress in speech-related tasks is dependent on the quality of the speech signal being processed. While much progress has been made in various aspects of speech processing (including but not limited to, speech recognition, language detection, and speaker diarization), enhancing a noise-corrupted speech signal as it relates to those tasks has not been rigorously evaluated. Speech enhancement aims to improve the signal-to-noise ratio of a noise-corrupted signal to boost the speech elements (signal) and reduce the non-speech ones (noise). Speech enhancement techniques are evaluated using metrics that are either subjective (asking people their opinion of the enhanced signal) or objective (attempt to calculate metrics based on the signal itself). The subjective measures are better indicators of improved quality but do not scale well to large datasets. The objective metrics have mostly been constructed to attempt to model the subjective results. Our goal in this work is to establish a benchmark to assess the improvement of speech enhancement as it relates to the downstream task of automated speech recognition. In doing so, we retain the qualities of subjective measures while ensuring that evaluation can be done at a large scale in an automated fashion. We explore the impact of various noise types, including stationary, non-stationary, and a shift in noise distribution. We found that existing objective metrics are not a strong indicator of performance as it relates to an improvement in a downstream task. As such, we believe that Word Error Rate should be used when the downstream task is automated speech recognition.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Towards Robust Speech Emotion Recognition using Deep Residual Networks for Speech Enhancement
    Triantafyllopoulos, Andreas
    Keren, Gil
    Wagner, Johannes
    Steiner, Ingmar
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 1691 - 1695
  • [2] COMPARISON OF DIFFERENT SPEECH ENHANCEMENT METHODS ON RECOGNITION OF NOISY SPEECH
    AHMED, MS
    ALMARZOUG, AM
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 1994, 19 (01): : 45 - 56
  • [3] Robust distributed speech recognition using speech enhancement
    Flynn, Ronan
    Jones, Edward
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2008, 54 (03) : 1267 - 1273
  • [4] Robust recognition of noisy speech using speech enhancement
    Xu, YF
    Zhang, JJ
    Yao, KS
    Cao, ZG
    Ma, ZX
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
  • [5] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
    Paliwal, Kuldip K.
    Lyons, James G.
    So, Stephen
    Stark, Anthony P.
    Wojcicki, Kamil K.
    2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,
  • [6] Yi Language Speech Recognition using Deep Learning Methods
    Chen, Ziyan
    Yang, Hongwu
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1064 - 1068
  • [7] Speech Enhancement for Speaker Recognition Using Deep Recurrent Neural Networks
    Tkachenko, Maxim
    Yamshinin, Alexander
    Lyubimov, Nikolay
    Kotov, Mikhail
    Nastasenko, Marina
    SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 690 - 699
  • [8] A Deep Learning Speech Enhancement Architecture Optimised for Speech Recognition and Hearing Aids
    Nossier, Soha A.
    Wall, Julie
    Moniri, Mansour
    Glackin, Cornelius
    Cannings, Nigel
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 553 - 558
  • [9] Speech Recognition using Deep Learning
    Lakkhanawannakun, Phoemporn
    Noyunsan, Chaluemwut
    2019 34TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2019), 2019, : 514 - 517
  • [10] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238