Improving GANs for Speech Enhancement

被引:81
作者
Phan, Huy [1 ]
McLoughlin, Ian V. [2 ]
Pham, Lam [3 ]
Chen, Oliver Y. [4 ]
Koch, Philipp [5 ]
De Vos, Maarten [6 ]
Mertins, Alfred [5 ]
机构
[1] Queen Mary Univ London, London E1 4NS, England
[2] Singapore Inst Technol, Singapore 138683, Singapore
[3] Univ Kent, Canterbury CT2 7NZ, Kent, England
[4] Univ Oxford, Oxford OX1 2JD, England
[5] Univ Lubeck, D-23562 Lubeck, Germany
[6] KU Leven, B-3000 Leuven, Belgium
关键词
Generators; Noise measurement; Speech enhancement; Gallium nitride; Convolution; Decoding; Task analysis; generative adversarial networks; SEGAN; ISEGAN; DSEGAN; NETWORK; NOISE;
D O I
10.1109/LSP.2020.3025020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement mapping, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement mappings at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan.
引用
收藏
页码:1700 / 1704
页数:5
相关论文
共 35 条
  • [11] Speech Enhancement Using a Two-Stage Network for an Efficient Boosting Strategy
    Kim, Juntae
    Hahn, Minsoo
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (05) : 770 - 774
  • [12] Kingma J., 2014, Adam: A method for stochastic optimization
  • [13] Speech Enhancement In Multiple-Noise Conditions using Deep Neural Networks
    Kumar, Anurag
    Florencio, Dinei
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3738 - 3742
  • [14] A Conditional Generative Model for Speech Enhancement
    Li, Zeng-Xi
    Dai, Li-Rong
    Song, Yan
    McLoughlin, Ian
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (11) : 5005 - 5022
  • [15] LIM JS, 1978, IEEE T ACOUST SPEECH, V26, P197, DOI 10.1109/TASSP.1978.1163086
  • [16] Loizou P. C., 2013, SPEECH ENHANCEMENT T, DOI DOI 10.1201/B14529
  • [17] Mamun N, 2019, INTERSPEECH, P4265, DOI [10.21437/interspeech.2019-1850, 10.21437/Interspeech.2019-1850]
  • [18] Least Squares Generative Adversarial Networks
    Mao, Xudong
    Li, Qing
    Xie, Haoran
    Lau, Raymond Y. K.
    Wang, Zhen
    Smolley, Stephen Paul
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2813 - 2821
  • [19] A Fully Convolutional Neural Network for Speech Enhancement
    Park, Se Rim
    Lee, Jin Won
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1993 - 1997
  • [20] Towards Generalized Speech Enhancement with Generative Adversarial Networks
    Pascual, Santiago
    Serra, Joan
    Bonafonte, Antonio
    [J]. INTERSPEECH 2019, 2019, : 1791 - 1795