CP-GAN: CONTEXT PYRAMID GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT

被引：0

作者：

Liu, Gang ^{[1
]}

Gong, Ke ^{[2
]}

Liang, Xiaodan ^{[1
]}

Chen, Zhiguang ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China

[2] DarkMatter AI Res, Guangzhou, Guangdong, Peoples R China

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

speech enhancement; generative adversarial network; context pyramid;

D O I：

10.1109/icassp40776.2020.9054060

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The topic of speech enhancement has been largely improved recently, especially with the development of generative adversarial networks (GANs). However prior methods simply follow the GAN architectures from computer vision tasks without specific designs for the speech enhancement according to the audio characteristics (i.e., different granularity context), which may leave noise points in some segments or disturb the contents of the original audio. In this work, we make the first attempt to explore the global and local speech features for coarse-to-fine speech enhancement and introduce a Context Pyramid Generative Adversarial Network (CP-GAN), which contains a densely-connected feature pyramid generator and a dynamic context granularity discriminator to better eliminate audio noise hierarchically. Extensive experiments demonstrate that our CP-GAN effectively achieves state-of-the-art speech enhancement results and boosts the performance of more high-level speech tasks including automatic speech recognition and speaker recognition.

引用

页码：6624 / 6628

页数：5

共 22 条

[1] Baby D, 2019, INT CONF ACOUST SPEE, P106, DOI [10.1109/ICASSP.2019.8683799, 10.1109/icassp.2019.8683799]
[2] A Smartphone-Based Multi-Functional Hearing Assistive System to Facilitate Speech Recognition in the Classroom
Chern, Alan
Lai, Ying-Hui
Chang, Yi-Ping
Tsao, Yu
Chang, Ronald Y.
Chang, Hsiu-Wen
[J]. IEEE ACCESS, 2017, 5 : 10339 - 10351
[3] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[4] Higuchi T, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P40, DOI 10.1109/ASRU.2017.8268914
[5] Evaluation of objective quality measures for speech enhancement
Hu, Yi
Loizou, Philipos C.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 229 - 238
[6] Huang G., 2017, P IEEE C COMP VIS PA, P4700, DOI [DOI 10.1109/CVPR.2017.243, 10.1109/CVPR.2017.243]
[7] Isola P., 2017, CVPR, P1125, DOI [10.1109/cvpr.2017.632, DOI 10.1109/CVPR.2017.632, 10.1109/CVPR.2017.632]
[8] Jolicoeur-Martineau A., 2018, ARXIV
[9] Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
Ledig, Christian
Theis, Lucas
Huszar, Ferenc
Caballero, Jose
Cunningham, Andrew
Acosta, Alejandro
Aitken, Andrew
Tejani, Alykhan
Totz, Johannes
Wang, Zehan
Shi, Wenzhe
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 105 - 114
[10] Jasper: An End-to-End Convolutional Neural Acoustic Model
Li, Jason
Lavrukhin, Vitaly
Ginsburg, Boris
Leary, Ryan
Kuchaiev, Oleksii
Cohen, Jonathan M.
Nguyen, Huyen
Gadde, Ravi Teja
[J]. INTERSPEECH 2019, 2019, : 71 - 75

← 1 2 3 →