ARTIFICIAL BANDWIDTH EXTENSION USING A CONDITIONAL GENERATIVE ADVERSARIAL NETWORK WITH DISCRIMINATIVE TRAINING

被引:0
作者
Sautter, Jonas [1 ]
Faubel, Friedrich [1 ]
Buck, Markus [1 ]
Schmidt, Gerhard [2 ]
机构
[1] Nuance Commun, Speech Signal Enhancement, D-89077 Ulm, Germany
[2] Univ Kiel, Digital Signal Proc & Syst Theory, D-24143 Kiel, Germany
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
artificial bandwidth extension; generative adversarial networks; discriminative training; NEURAL-NETWORKS;
D O I
10.1109/icassp.2019.8682649
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The aim of artificial bandwidth extension is to recreate wideband speech (0 -8 kHz) from a narrowband speech signal (0 -4 kHz). State-of-the-art approaches use neural networks for this task. As a loss function during training, they employ the mean squared error between true and estimated wideband spectra. This, however, comes with the drawback of over-smoothing, which expresses itself in strongly underestimated dynamics of the upper frequency band. We previously proposed to tackle this problem by discriminative training, i. e., a modification of the loss function that is designed to improve the separation between fricatives and vowels. Other authors instead took a generative adversarial network (GAN) approach. This was motivated by the fact that GANs demonstrated big reductions of over-smoothing in speech synthesis. In this work, we combine these two approaches. In particular, we show that conditional GANs improve the speech quality by a CMOS score of 0.28 compared to GANs while the combined approach yields an improvement of 0.84.
引用
收藏
页码:7005 / 7009
页数:5
相关论文
共 32 条
  • [1] Artificial Speech Bandwidth Extension Using Deep Neural Networks for Wideband Spectral Envelope Estimation
    Abel, Johannes
    Fingscheidt, Tim
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 71 - 83
  • [2] An Instrumental Quality Measure for Artificially Bandwidth-Extended Speech Signals
    Abel, Johannes
    Kaniewska, Magdalena
    Guillaume, Cyril
    Tirry, Wouter
    Fingscheidt, Tim
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (02) : 384 - 396
  • [3] Abel J, 2016, INT CONF ACOUST SPEE, P5915, DOI 10.1109/ICASSP.2016.7472812
  • [4] [Anonymous], SEGAN SPEECH ENHANCE
  • [5] [Anonymous], 1996, METHODS SUBJECTIVE D
  • [6] [Anonymous], P JAHR AK DAGA MUN G
  • [7] [Anonymous], 2015, INT C LEARNING REPRE
  • [8] [Anonymous], INVESTIGATING GENERA
  • [9] [Anonymous], P IWAENC SEPT
  • [10] [Anonymous], EXPLORING SPEECH ENH