Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition

被引:1
|
作者
Du, Zhihao [1 ]
Han, Jiqing [1 ]
Zhang, Xueliang [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Inner Mongolia Univ, Dept Comp Sci, Hohhot, Peoples R China
来源
INTERSPEECH 2020 | 2020年
基金
中国国家自然科学基金;
关键词
speech enhancement; adversarial training; speech recognition; CHiME-2; NEURAL-NETWORK;
D O I
10.21437/Interspeech.2020-1504
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
To improve the noise robustness of automatic speech recognition (ASR), the generative adversarial network (GAN) based enhancement methods are employed as the front-end processing, which comprise a single adversarial process of an enhancement model and a discriminator. In this single adversarial process, the discriminator is encouraged to find differences between the enhanced and clean speeches, but the distribution of clean speeches is ignored. In this paper, we propose a double adversarial network (DAN) by adding another adversarial generation process (AGP), which forces the discriminator not only to find the differences but also to model the distribution. Furthermore, a functional mean square error (f-MSE) is proposed to utilize the representations learned by the discriminator. Experimental results reveal that AGP and f-MSE are crucial for the enhancement performance on ASR task, which are missed in previous GAN-based methods. Specifically, our DAN achieves 13.00% relative word error rate improvements over the noisy speeches on the test set of CHiME-2, which outperforms several recent GAN-based enhancement methods significantly.
引用
收藏
页码:309 / 313
页数:5
相关论文
共 50 条
  • [1] Monaural speech separation based on MAXVQ and CASA for robust speech recognition
    Li, Peng
    Guan, Yong
    Wang, Shijin
    Xu, Bo
    Liu, Wenju
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 30 - 44
  • [2] EXPLORING SPEECH ENHANCEMENT WITH GENERATIVE ADVERSARIAL NETWORKS FOR ROBUST SPEECH RECOGNITION
    Donahue, Chris
    Li, Bo
    Prabhavalkar, Rohit
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5024 - 5028
  • [3] Adversarial Dictionary Learning for Monaural Speech Enhancement
    Ji, Yunyun
    Xu, Longting
    Zhu, Wei-Ping
    INTERSPEECH 2020, 2020, : 4034 - 4038
  • [4] LOCAL TRAJECTORY BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION WITH DEEP NEURAL NETWORK
    You, Yongbin
    Qian, Yanmin
    Yu, Kai
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 5 - 9
  • [5] REINFORCEMENT LEARNING BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION
    Shen, Yih-Liang
    Huang, Chao-Yuan
    Wang, Syu-Siang
    Tsao, Yu
    Wang, Hsin-Min
    Chi, Tai-Shih
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6750 - 6754
  • [6] Convolutional fusion network for monaural speech enhancement
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    NEURAL NETWORKS, 2021, 143 : 97 - 107
  • [7] Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition
    Wang, Ke
    Zhang, Junbo
    Sun, Sining
    Wang, Yujun
    Xiang, Fei
    Xie, Lei
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1581 - 1585
  • [8] SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS
    Suzuki, Masayuki
    Kurata, Gakuto
    Nagano, Tohru
    Tachibana, Ryuki
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5685 - 5689
  • [9] Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition
    Kim, Geonmin
    Lee, Hwaran
    Kim, Bo-Kyeong
    Oh, Sang-Hoon
    Lee, Soo-Young
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (01) : 159 - 163
  • [10] GSC Based Speech Enhancement with Generative Adversarial Network
    Zhou, Yao
    Bao, Changchun
    Cheng, Rui
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 901 - 906