CP-GAN: CONTEXT PYRAMID GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT

被引:0
作者
Liu, Gang [1 ]
Gong, Ke [2 ]
Liang, Xiaodan [1 ]
Chen, Zhiguang [1 ]
机构
[1] Sun Yat Sen Univ, Guangzhou, Guangdong, Peoples R China
[2] DarkMatter AI Res, Guangzhou, Guangdong, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
speech enhancement; generative adversarial network; context pyramid;
D O I
10.1109/icassp40776.2020.9054060
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The topic of speech enhancement has been largely improved recently, especially with the development of generative adversarial networks (GANs). However prior methods simply follow the GAN architectures from computer vision tasks without specific designs for the speech enhancement according to the audio characteristics (i.e., different granularity context), which may leave noise points in some segments or disturb the contents of the original audio. In this work, we make the first attempt to explore the global and local speech features for coarse-to-fine speech enhancement and introduce a Context Pyramid Generative Adversarial Network (CP-GAN), which contains a densely-connected feature pyramid generator and a dynamic context granularity discriminator to better eliminate audio noise hierarchically. Extensive experiments demonstrate that our CP-GAN effectively achieves state-of-the-art speech enhancement results and boosts the performance of more high-level speech tasks including automatic speech recognition and speaker recognition.
引用
收藏
页码:6624 / 6628
页数:5
相关论文
共 22 条
  • [1] Baby D, 2019, INT CONF ACOUST SPEE, P106, DOI [10.1109/ICASSP.2019.8683799, 10.1109/icassp.2019.8683799]
  • [2] A Smartphone-Based Multi-Functional Hearing Assistive System to Facilitate Speech Recognition in the Classroom
    Chern, Alan
    Lai, Ying-Hui
    Chang, Yi-Ping
    Tsao, Yu
    Chang, Ronald Y.
    Chang, Hsiu-Wen
    [J]. IEEE ACCESS, 2017, 5 : 10339 - 10351
  • [3] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
  • [4] Higuchi T, 2017, 2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), P40, DOI 10.1109/ASRU.2017.8268914
  • [5] Evaluation of objective quality measures for speech enhancement
    Hu, Yi
    Loizou, Philipos C.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 229 - 238
  • [6] Huang G., 2017, P IEEE C COMP VIS PA, P4700, DOI [DOI 10.1109/CVPR.2017.243, 10.1109/CVPR.2017.243]
  • [7] Isola P., 2017, CVPR, P1125, DOI [10.1109/cvpr.2017.632, DOI 10.1109/CVPR.2017.632, 10.1109/CVPR.2017.632]
  • [8] Jolicoeur-Martineau A., 2018, ARXIV
  • [9] Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network
    Ledig, Christian
    Theis, Lucas
    Huszar, Ferenc
    Caballero, Jose
    Cunningham, Andrew
    Acosta, Alejandro
    Aitken, Andrew
    Tejani, Alykhan
    Totz, Johannes
    Wang, Zehan
    Shi, Wenzhe
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 105 - 114
  • [10] Jasper: An End-to-End Convolutional Neural Acoustic Model
    Li, Jason
    Lavrukhin, Vitaly
    Ginsburg, Boris
    Leary, Ryan
    Kuchaiev, Oleksii
    Cohen, Jonathan M.
    Nguyen, Huyen
    Gadde, Ravi Teja
    [J]. INTERSPEECH 2019, 2019, : 71 - 75