On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network

被引：3

作者：

Faraji, Farnood ^{[1
]}

Attabi, Yazid ^{[1
]}

Champagne, Benoit ^{[1
]}

Zhu, Wei-Ping ^{[2
]}

机构：

[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada

[2] Concordia Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada

来源：

2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS) | 2020年

基金：

加拿大自然科学与工程研究理事会;

关键词：

audio fingerprinting; generative adversarial network; spectral subband centroids; speech enhancement; NOISE;

D O I：

10.1109/sips50750.2020.9195238

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The advent of learning-based methods in speech enhancement has revived the need for robust and reliable training features that can compactly represent speech signals while preserving their vital information. Time-frequency domain features, such as the Short-Time Fourier Transform (STFT) and Mel-Frequency Cepstral Coefficients (MFCC), are preferred in many approaches. While the MFCC provide for a compact representation, they ignore the dynamics and distribution of energy in each mel-scale subband. In this work, a speech enhancement system based on Generative Adversarial Network (GAN) is implemented and tested with a combination of Audio FingerPrinting (AFP) features obtained from the MFCC and the Normalized Spectral Subband Centroids (NSSC). The NSSC capture the locations of speech formants and complement the MFCC in a crucial way. In experiments with diverse speakers and noise types, GAN-based speech enhancement with the proposed AFP feature combination achieves the best objective performance while reducing memory requirements and training time.

引用

页码：77 / 82

页数：6

共 25 条

[1]

Abdulbaqi J., 2019, ARXIV190407294

[2]

Benesty J, 2005, Speech Enhancement

[3] A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios [J].

Chen, Jitong ;

Wang, Yuxuan ;

Wang, DeLiang .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1993-2002

[4]

Duong N.Q., 2015, ARXIV PREPRINT ARXIV

[5]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

[6]

Kinnunen T, 2007, LECT NOTES COMPUT SC, V4642, P58

[7] Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems [J].

Kolbaek, Morten ;

Tan, Zheng-Hua ;

Jensen, Jesper .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) :153-167

[8] Least Squares Generative Adversarial Networks [J].

Mao, Xudong ;

Li, Qing ;

Xie, Haoran ;

Lau, Raymond Y. K. ;

Wang, Zhen ;

Smolley, Stephen Paul .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :2813-2821

[9]

Mirza M., 2014, CONDITIONAL GENERATI

[10]

Narayanan A, 2013, INT CONF ACOUST SPEE, P7092, DOI 10.1109/ICASSP.2013.6639038

← 1 2 3 →