LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK

被引:0
作者
Pascual, Santiago [1 ]
Park, Maruchan [2 ]
Serra, Joan [3 ]
Bonafonte, Antonio [1 ]
Ahn, Kang-Hun [2 ]
机构
[1] Univ Politecn Cataluna, Barcelona, Spain
[2] Chungnam Natl Univ, Daejeon, South Korea
[3] Telefon Res, Barcelona, Spain
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
基金
新加坡国家研究基金会;
关键词
Speech enhancement; deep learning; transfer learning; generative adversarial networks; SPEAKER ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean. We also study the variability of test performance to unseen noise as a function of the amount of different types of noise available for training Results show that adapting a pre-trained English model with 10 min of data already achieves a comparable performance to having two orders of magnitude more data. They also demonstrate the relative stability in test performance with respect to the number of training noise types.
引用
收藏
页码:5019 / 5023
页数:5
相关论文
共 23 条
  • [1] [Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
  • [2] [Anonymous], 2016, SSW
  • [3] [Anonymous], 2007, ITU-T P.862.2
  • [4] Bonafonte A, 2008, SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, P3325
  • [5] DNN adaptation by automatic quality estimation of ASR hypotheses
    Falavigna, Daniele
    Matassoni, Marco
    Jalalvand, Shahab
    Negri, Matteo
    Turchi, Marco
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 585 - 604
  • [6] Fan YC, 2015, INT CONF ACOUST SPEE, P4475, DOI 10.1109/ICASSP.2015.7178817
  • [7] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
  • [8] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [9] Evaluation of objective quality measures for speech enhancement
    Hu, Yi
    Loizou, Philipos C.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 229 - 238
  • [10] Loizou P. C., 2013, Speech enhancement: theory and practice, V2nd, DOI 10.1201/b14529