On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network

被引:3
作者
Faraji, Farnood [1 ]
Attabi, Yazid [1 ]
Champagne, Benoit [1 ]
Zhu, Wei-Ping [2 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada
[2] Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada
来源
2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS) | 2020年
基金
加拿大自然科学与工程研究理事会;
关键词
audio fingerprinting; generative adversarial network; spectral subband centroids; speech enhancement; NOISE;
D O I
10.1109/sips50750.2020.9195238
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The advent of learning-based methods in speech enhancement has revived the need for robust and reliable training features that can compactly represent speech signals while preserving their vital information. Time-frequency domain features, such as the Short-Time Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficients (MFCC), are preferred in many approaches. While the MFCC provide for a compact representation, they ignore the dynamics and distribution of energy in each mel-scale subband. In this work, a speech enhancement system based on Generative Adversarial Network (GAN) is implemented and tested with a combination of Audio FingerPrinting (AFP) features obtained from the MFCC and the Normalized Spectral Subband Centroids (NSSC). The NSSC capture the locations of speech formants and complement the MFCC in a crucial way. In experiments with diverse speakers and noise types, GAN-based speech enhancement with the proposed AFP feature combination achieves the best objective performance while reducing memory requirements and training time.
引用
收藏
页码:77 / 82
页数:6
相关论文
共 25 条
[1]  
Abdulbaqi J., 2019, ARXIV190407294
[2]  
Benesty J, 2005, Speech Enhancement
[3]   A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios [J].
Chen, Jitong ;
Wang, Yuxuan ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1993-2002
[4]  
Duong N.Q., 2015, ARXIV PREPRINT ARXIV
[5]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[6]  
Kinnunen T, 2007, LECT NOTES COMPUT SC, V4642, P58
[7]   Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems [J].
Kolbaek, Morten ;
Tan, Zheng-Hua ;
Jensen, Jesper .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) :153-167
[8]   Least Squares Generative Adversarial Networks [J].
Mao, Xudong ;
Li, Qing ;
Xie, Haoran ;
Lau, Raymond Y. K. ;
Wang, Zhen ;
Smolley, Stephen Paul .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2813-2821
[9]  
Mirza M., 2014, CONDITIONAL GENERATI
[10]  
Narayanan A, 2013, INT CONF ACOUST SPEE, P7092, DOI 10.1109/ICASSP.2013.6639038