Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

被引:19
作者
Kim, Geonmin [1 ]
Lee, Hwaran [1 ]
Kim, Bo-Kyeong [1 ]
Oh, Sang-Hoon [2 ]
Lee, Soo-Young [3 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn, Daejeon 305701, South Korea
[2] Mokwon Univ, Div Informat & Commun Convergence Engn, Daejeon 302318, South Korea
[3] Korea Adv Inst Sci & Technol, Inst Artificial Intelligence, Daejeon 305701, South Korea
关键词
Speech enhancement; room simulator; connectionist temporal classification; generative adversarial network;
D O I
10.1109/LSP.2018.2880285
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Many speech enhancement methods try to learn the relationship between noisy and clean speechs, obtained using an acoustic room simulator. We point out several limitations of enhancement methods relying on clean speech targets; the goal of this letter is to propose an alternative learning algorithm, called acoustic and adversarial supervision (AAS). AAS makes the enhanced output both maximizing the likelihood of transcription on the pre-trained acoustic model and having general characteristics of clean speech, which improve generalization on unseen noisy speeches. We employ the connectionist temporal classification and the unpaired conditional boundary equilibrium generative adversarial network as the loss function of AAS. AAS is tested on two datasets including additive noise without and with reverberation, Librispeech + DEMAND, and CHiME-4. By visualizing the enhanced speech with different loss combinations, we demonstrate the role of each supervision. AAS achieves a lower word error rate than other state-of-the-art methods using the clean speech target in both datasets.
引用
收藏
页码:159 / 163
页数:5
相关论文
共 50 条
  • [31] Single channel enhancement for speech recognition
    Droppo, Jasha
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 94 - 98
  • [32] Controllable Conformer for Speech Enhancement and Recognition
    Guo, Zilu
    Du, Jun
    Siniscalchi, Sabato Marco
    Pan, Jia
    Liu, Qingfeng
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 156 - 160
  • [33] Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
    Lin, Ju
    Niu, Sufeng
    Wei, Zice
    Lan, Xiang
    van Wijngaarden, Adriaan J.
    Smith, Melissa C.
    Wang, Kuang-Ching
    INTERSPEECH 2019, 2019, : 3163 - 3167
  • [34] Time-domain speech enhancement using generative adversarial networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    SPEECH COMMUNICATION, 2019, 114 : 10 - 21
  • [35] On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network
    Faraji, Farnood
    Attabi, Yazid
    Champagne, Benoit
    Zhu, Wei-Ping
    2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 77 - 82
  • [36] Performance Analysis of Speech Enhancement Algorithm for Robust Speech Recognition System
    Babu, C. Ganesh
    Vanathi, P. T.
    Ramachandran, R.
    Rajaa, M. Senthil
    RECENT ADVANCES IN NETWORKING, VLSI AND SIGNAL PROCESSING, 2010, : 197 - +
  • [37] Combining speech enhancement and auditory feature extraction for robust speech recognition
    Kleinschmidt, M
    Tchorz, J
    Kollmeier, B
    SPEECH COMMUNICATION, 2001, 34 (1-2) : 75 - 91
  • [38] Towards Generalized Speech Enhancement with Generative Adversarial Networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    INTERSPEECH 2019, 2019, : 1791 - 1795
  • [39] Speech Enhancement Method Based On LSTM Neural Network for Speech Recognition
    Liu, Ming
    Wang, Yujun
    Wang, Jin
    Wang, Jing
    Xie, Xiang
    PROCEEDINGS OF 2018 14TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2018, : 245 - 249
  • [40] Comparative Evaluation of Speech Enhancement Methods for Robust Automatic Speech Recognition
    Paliwal, Kuldip K.
    Lyons, James G.
    So, Stephen
    Stark, Anthony P.
    Wojcicki, Kamil K.
    2010 4TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2010,