Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

被引:19
作者
Kim, Geonmin [1 ]
Lee, Hwaran [1 ]
Kim, Bo-Kyeong [1 ]
Oh, Sang-Hoon [2 ]
Lee, Soo-Young [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 305701, South Korea
[2] Mokwon Univ, Div Informat & Commun Convergence Engn, Daejeon 302318, South Korea
[3] Korea Adv Inst Sci & Technol, Inst Artificial Intelligence, Daejeon 305701, South Korea
关键词
Speech enhancement; room simulator; connectionist temporal classification; generative adversarial network;
D O I
10.1109/LSP.2018.2880285
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.
引用
收藏
页码:159 / 163
页数:5
相关论文
共 50 条
  • [41] SPEECH ENHANCEMENT VIA GENERATIVE ADVERSARIAL LSTM NETWORKS
    Xiang, Yang
    Bao, Changchun
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 46 - 50
  • [42] Combined speech enhancement and auditory modelling for robust distributed speech recognition
    Flynn, Ronan
    Jones, Edward
    SPEECH COMMUNICATION, 2008, 50 (10) : 797 - 809
  • [43] Speech Enhancement Parameter Adjustment to Maximize Accuracy of Automatic Speech Recognition
    Kawase, Tomoko
    Okamoto, Manabu
    Fukutomi, Takaaki
    Takahashi, Yamato
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2020, 66 (02) : 125 - 133
  • [44] Auditory driven subband speech enhancement for automatic recognition of noisy speech
    Upadhyay N.
    Rosales H.G.
    International Journal of Speech Technology, 2016, 19 (4) : 869 - 880
  • [45] SPEECH ENHANCEMENT USING END-TO-END SPEECH RECOGNITION OBJECTIVES
    Subramanian, Aswin Shanmugam
    Wang, Xiaofei
    Baskar, Murali Karthick
    Watanabe, Shinji
    Taniguchi, Toru
    Tran, Dung
    Fujita, Yuya
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 234 - 238
  • [46] A PROGRESSIVE LEARNING APPROACH TO ADAPTIVE NOISE AND SPEECH ESTIMATION FOR SPEECH ENHANCEMENT AND NOISY SPEECH RECOGNITION
    Nian, Zhaoxu
    Tu, Yan-Hui
    Du, Jun
    Lee, Chin-Hui
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6913 - 6917
  • [47] IMPROVING SPEECH RECOGNITION ON NOISY SPEECH VIA SPEECH ENHANCEMENT WITH MULTI-DISCRIMINATORS CYCLEGAN
    Li, Chia-Yu
    Ngoc Thang Vu
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 830 - 836
  • [48] A filter constructed from estimation of clean speech and noise for speech enhancement in speech recognition systems
    Meng Sha
    Qin Shenghao
    Liu Jia
    2006 IMACS: MULTICONFERENCE ON COMPUTATIONAL ENGINEERING IN SYSTEMS APPLICATIONS, VOLS 1 AND 2, 2006, : 1620 - +
  • [49] UP-Cycle-SENet: Unpaired phase-aware speech enhancement using deep complex cycle adversarial networks
    Park, Cheol-Hoon
    Choi, Hyun-Duck
    NEUROCOMPUTING, 2025, 624
  • [50] Acoustic-to-Phrase Models for Speech Recognition
    Gaur, Yashesh
    Li, Jinyu
    Meng, Zhong
    Gong, Yifan
    INTERSPEECH 2019, 2019, : 2240 - 2244