Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

被引:19
|
作者
Kim, Geonmin [1 ]
Lee, Hwaran [1 ]
Kim, Bo-Kyeong [1 ]
Oh, Sang-Hoon [2 ]
Lee, Soo-Young [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 305701, South Korea
[2] Mokwon Univ, Div Informat & Commun Convergence Engn, Daejeon 302318, South Korea
[3] Korea Adv Inst Sci & Technol, Inst Artificial Intelligence, Daejeon 305701, South Korea
关键词
Speech enhancement; room simulator; connectionist temporal classification; generative adversarial network;
D O I
10.1109/LSP.2018.2880285
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.
引用
收藏
页码:159 / 163
页数:5
相关论文
共 50 条
  • [21] Compensation of speech enhancement distortion for robust speech recognition
    Ding, P
    Cao, ZG
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 449 - 452
  • [22] Speech enhancement applied to speech recognition in noisy environments
    Xu, Y.F., 2001, Press of Tsinghua University (41):
  • [23] DUAL APPLICATION OF SPEECH ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION
    Pandey, Ashutosh
    Liu, Chunxi
    Wang, Yun
    Saraf, Yatharth
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 223 - 228
  • [24] CONSTRAINED ITERATIVE SPEECH ENHANCEMENT WITH APPLICATION TO SPEECH RECOGNITION
    HANSEN, JHL
    CLEMENTS, MA
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1991, 39 (04) : 795 - 805
  • [25] Robust recognition of noisy speech using speech enhancement
    Xu, YF
    Zhang, JJ
    Yao, KS
    Cao, ZG
    Ma, ZX
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 734 - 737
  • [26] ADVERSARIAL LEARNING OF RAW SPEECH FEATURES FOR DOMAIN INVARIANT SPEECH RECOGNITION
    Tripathi, Aditay
    Mohan, Aanchan
    Anand, Saket
    Singh, Maneesh
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5959 - 5963
  • [27] Adversarial Dictionary Learning for Monaural Speech Enhancement
    Ji, Yunyun
    Xu, Longting
    Zhu, Wei-Ping
    INTERSPEECH 2020, 2020, : 4034 - 4038
  • [28] CycleGAN-based speech enhancement for the unpaired training data
    Yuan, Jing
    Bao, Changchun
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 878 - 883
  • [29] Adversarial Latent Representation Learning for Speech Enhancement
    Qiu, Yuanhang
    Wang, Ruili
    INTERSPEECH 2020, 2020, : 2662 - 2666
  • [30] ATT:Adversarial Trained Transformer for Speech Enhancement
    Aitawade, Aniket
    Bharati, Puja
    Chandra, Sabyasachi
    Prasad, G. Satya
    Pramanik, Debolina
    Khadse, Parth Sanjay
    Das Mandal, Shyamal Kumar
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 258 - 270