A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement

被引:0
作者
Liang, Xintao [1 ]
Li, Yuhang [1 ]
Li, Xiaomin [1 ]
Zhang, Yue [1 ]
Ding, Youdong [1 ]
机构
[1] Shanghai Univ, Shanghai Film Acad, Shanghai 200072, Peoples R China
基金
中国国家自然科学基金;
关键词
speech enhancement; GAN; transformer; phase; spectrogram; dual stream; INTELLIGIBILITY; OPTIMIZATION; ALGORITHM; NOISE;
D O I
10.3390/info14040221
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Implementing single-channel speech enhancement under unknown noise conditions is a challenging problem. Most existing time-frequency domain methods are based on the amplitude spectrogram, and these methods often ignore the phase mismatch between noisy speech and clean speech, which largely limits the performance of speech enhancement. To solve the phase mismatch problem and further improve enhancement performance, this paper proposes a dual-stream Generative Adversarial Network (GAN) with phase awareness, named DPGAN. Our generator uses a dual-stream structure to predict amplitude and phase separately and adds an information communication module between the two streams to fully apply the phase information. To make the prediction more efficient, we apply Transformer to build the generator, which can learn the sound's structural properties more easily. Finally, we designed a perceptually guided discriminator that quantitatively evaluates the quality of speech, optimising the generator for specific evaluation metrics. We conducted experiments on the most widely used Voicebank-DEMAND dataset and DPGAN achieved state-of-the-art on most metrics.
引用
收藏
页数:21
相关论文
共 72 条
[21]   A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation [J].
Hu, Guoning ;
Wang, DeLiang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08) :2067-2079
[22]   DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement [J].
Hu, Yanxin ;
Liu, Yun ;
Lv, Shubo ;
Xing, Mengtao ;
Zhang, Shimin ;
Fu, Yihui ;
Wu, Jian ;
Zhang, Bihong ;
Xie, Lei .
INTERSPEECH 2020, 2020, :2472-2476
[23]   Evaluation of objective quality measures for speech enhancement [J].
Hu, Yi ;
Loizou, Philipos C. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01) :229-238
[24]  
Huang Huixiang, 2022, 2022 International Symposium on Electrical, Electronics and Information Engineering (ISEEIE), P30, DOI 10.1109/ISEEIE55684.2022.00013
[25]   An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers [J].
Jensen, Jesper ;
Taal, Cees H. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) :2009-2022
[26]   The Hearing-Aid Speech Perception Index (HASPI) [J].
Kates, James M. ;
Arehart, Kathryn H. .
SPEECH COMMUNICATION, 2014, 65 :75-93
[27]   SE-Conformer: Time-Domain Speech Enhancement using Conformer [J].
Kim, Eesung ;
Seo, Hyeji .
INTERSPEECH 2021, 2021, :2736-2740
[28]  
Kingma DP, 2014, ADV NEUR IN, V27
[29]   DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score [J].
Koizumi, Yuma ;
Niwa, Kenta ;
Hioka, Yusuke ;
Kobayashi, Kazunori ;
Haneda, Yoichi .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (10) :1780-1792
[30]  
Kolbæk M, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5059, DOI 10.1109/ICASSP.2018.8462040