Improving GANs for Speech Enhancement

被引:81
作者
Phan, Huy [1 ]
McLoughlin, Ian V. [2 ]
Pham, Lam [3 ]
Chen, Oliver Y. [4 ]
Koch, Philipp [5 ]
De Vos, Maarten [6 ]
Mertins, Alfred [5 ]
机构
[1] Queen Mary Univ London, London E1 4NS, England
[2] Singapore Inst Technol, Singapore 138683, Singapore
[3] Univ Kent, Canterbury CT2 7NZ, Kent, England
[4] Univ Oxford, Oxford OX1 2JD, England
[5] Univ Lubeck, D-23562 Lubeck, Germany
[6] KU Leven, B-3000 Leuven, Belgium
关键词
Generators; Noise measurement; Speech enhancement; Gallium nitride; Convolution; Decoding; Task analysis; generative adversarial networks; SEGAN; ISEGAN; DSEGAN; NETWORK; NOISE;
D O I
10.1109/LSP.2020.3025020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement mapping, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement mappings at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan.
引用
收藏
页码:1700 / 1704
页数:5
相关论文
共 35 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] [Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
  • [3] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [4] Donahue C, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5024, DOI 10.1109/ICASSP.2018.8462581
  • [5] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 443 - 445
  • [6] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
  • [7] Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay
    Gerkmann, Timo
    Hendriks, Richard C.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04): : 1383 - 1393
  • [8] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [9] Higuchi T, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P40, DOI 10.1109/ASRU.2017.8268914
  • [10] Image-to-Image Translation with Conditional Adversarial Networks
    Isola, Phillip
    Zhu, Jun-Yan
    Zhou, Tinghui
    Efros, Alexei A.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5967 - 5976