ARTIFICIAL BANDWIDTH EXTENSION USING A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK WITH DISCRIMINATIVE TRAINING

被引：0

作者：

Sautter, Jonas ^{[1
]}

Faubel, Friedrich ^{[1
]}

Buck, Markus ^{[1
]}

Schmidt, Gerhard ^{[2
]}

机构：

[1] Nuance Commun, Speech Signal Enhancement, D-89077 Ulm, Germany

[2] Univ Kiel, Digital Signal Proc & Syst Theory, D-24143 Kiel, Germany

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

artificial bandwidth extension; generative adversarial networks; discriminative training; NEURAL-NETWORKS;

D O I：

10.1109/icassp.2019.8682649

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The aim of artificial bandwidth extension is to recreate wideband speech (0 -8 kHz) from a narrowband speech signal (0 -4 kHz). State-of-the-art approaches use neural networks for this task. As a loss function during training, they employ the mean squared error between true and estimated wideband spectra. This, however, comes with the drawback of over-smoothing, which expresses itself in strongly underestimated dynamics of the upper frequency band. We previously proposed to tackle this problem by discriminative training, i. e., a modification of the loss function that is designed to improve the separation between fricatives and vowels. Other authors instead took a generative adversarial network (GAN) approach. This was motivated by the fact that GANs demonstrated big reductions of over-smoothing in speech synthesis. In this work, we combine these two approaches. In particular, we show that conditional GANs improve the speech quality by a CMOS score of 0.28 compared to GANs while the combined approach yields an improvement of 0.84.

引用

页码：7005 / 7009

页数：5

共 32 条

[1] Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation
Abel, Johannes
Fingscheidt, Tim
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 71 - 83
[2] An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals
Abel, Johannes
Kaniewska, Magdalena
Guillaume, Cyril
Tirry, Wouter
Fingscheidt, Tim
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) : 384 - 396
[3] Abel J, 2016, INT CONF ACOUST SPEE, P5915, DOI 10.1109/ICASSP.2016.7472812
[4] [Anonymous], SEGAN SPEECH ENHANCE
[5] [Anonymous], 1996, METHODS SUBJECTIVE D
[6] [Anonymous], P JAHR AK DAGA MUN G
[7] [Anonymous], 2015, INT C LEARNING REPRE
[8] [Anonymous], INVESTIGATING GENERA
[9] [Anonymous], P IWAENC SEPT
[10] [Anonymous], EXPLORING SPEECH ENH

← 1 2 3 4 →