Improving GANs for Speech Enhancement

被引：89

作者：

Phan, Huy ^{[1
]}

McLoughlin, Ian V. ^{[2
]}

Pham, Lam ^{[3
]}

Chen, Oliver Y. ^{[4
]}

Koch, Philipp ^{[5
]}

De Vos, Maarten ^{[6
]}

Mertins, Alfred ^{[5
]}

机构：

[1] Queen Mary Univ London, London E1 4NS, England

[2] Singapore Inst Technol, Singapore 138683, Singapore

[3] Univ Kent, Canterbury CT2 7NZ, Kent, England

[4] Univ Oxford, Oxford OX1 2JD, England

[5] Univ Lubeck, D-23562 Lubeck, Germany

[6] KU Leven, B-3000 Leuven, Belgium

来源：

IEEE SIGNAL PROCESSING LETTERS | 2020年 / 27卷

关键词：

Generators; Noise measurement; Speech enhancement; Gallium nitride; Convolution; Decoding; Task analysis; generative adversarial networks; SEGAN; ISEGAN; DSEGAN; NETWORK; NOISE;

D O I：

10.1109/LSP.2020.3025020

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement mapping, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement mappings at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan.

引用

页码：1700 / 1704

页数：5

共 35 条

[1]

Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265

[2]

[Anonymous], 2012, COURSERA NEURAL NETW

[3]

[Anonymous], 2015, Tiny ImageNet Visual Recognition Challenge., DOI DOI 10.1109/ICCV.2015.123

[4] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[5]

Donahue C, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5024, DOI 10.1109/ICASSP.2018.8462581

[6] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

[7]

Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061

[8] Unbiased MMSE-Based Noise Power Estimation With Low Complexity and Low Tracking Delay [J].

Gerkmann, Timo ;

Hendriks, Richard C. .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (04) :1383-1393

[9]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[10]

Higuchi T, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P40, DOI 10.1109/ASRU.2017.8268914

← 1 2 3 4 →