LANGUAGE AND NOISE TRANSFER IN SPEECH ENHANCEMENT GENERATIVE ADVERSARIAL NETWORK

被引：0

作者：

Pascual, Santiago ^{[1
]}

Park, Maruchan ^{[2
]}

Serra, Joan ^{[3
]}

Bonafonte, Antonio ^{[1
]}

Ahn, Kang-Hun ^{[2
]}

机构：

[1] Univ Politecn Cataluna, Barcelona, Spain

[2] Chungnam Natl Univ, Daejeon, South Korea

[3] Telefon Res, Barcelona, Spain

来源：

2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年

基金：

新加坡国家研究基金会;

关键词：

Speech enhancement; deep learning; transfer learning; generative adversarial networks; SPEAKER ADAPTATION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speech enhancement deep learning systems usually require large amounts of training data to operate in broad conditions or real applications. This makes the adaptability of those systems into new, low resource environments an important topic. In this work, we present the results of adapting a speech enhancement generative adversarial network by fine-tuning the generator with small amounts of data. We investigate the minimum requirements to obtain a stable behavior in terms of several objective metrics in two very different languages: Catalan and Korean. We also study the variability of test performance to unseen noise as a function of the amount of different types of noise available for training Results show that adapting a pre-trained English model with 10 min of data already achieves a comparable performance to having two orders of magnitude more data. They also demonstrate the relative stability in test performance with respect to the number of training noise types.

引用

页码：5019 / 5023

页数：5

共 23 条

[1] [Anonymous], 2015, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2015.123
[2] [Anonymous], 2016, SSW
[3] [Anonymous], 2007, ITU-T P.862.2
[4] Bonafonte A, 2008, SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, P3325
[5] DNN adaptation by automatic quality estimation of ASR hypotheses
Falavigna, Daniele
Matassoni, Marco
Jalalvand, Shahab
Negri, Matteo
Turchi, Marco
[J]. COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 585 - 604
[6] Fan YC, 2015, INT CONF ACOUST SPEE, P4475, DOI 10.1109/ICASSP.2015.7178817
[7] Generative Adversarial Networks
Goodfellow, Ian
Pouget-Abadie, Jean
Mirza, Mehdi
Xu, Bing
Warde-Farley, David
Ozair, Sherjil
Courville, Aaron
Bengio, Yoshua
[J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
[8] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[9] Evaluation of objective quality measures for speech enhancement
Hu, Yi
Loizou, Philipos C.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 229 - 238
[10] Loizou P. C., 2013, Speech enhancement: theory and practice, V2nd, DOI 10.1201/b14529

← 1 2 3 →