A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality

被引:107
作者
Manuel Martin-Donas, Juan [1 ]
Manuel Gomez, Angel [1 ]
Gonzalez, Jose A. [2 ]
Peinado, Antonio M. [1 ]
机构
[1] Univ Granada, Dept Signal Theory Telemat & Commun, E-18071 Granada, Spain
[2] Univ Malaga, Dept Languages & Comp Sci, Malaga 29016, Spain
关键词
Deep learning; loss function; speech enhancement; PESQ; DNN; ENHANCEMENT;
D O I
10.1109/LSP.2018.2871419
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a loss function, for training deep learning methods. This metric, derived from the perceptual evaluation of the speech quality algorithm, is computed in a perframe basis and from the power spectra of the reference and processed speech signal. Thus, two disturbance terms, which account for distortion once auditory masking and threshold effects are factored in, amend the mean square error (MSE) loss function by introducing perceptual criteria based on human psychoacoustics. The proposed loss function is evaluated f o r noisy speech enhancement with deep neural networks. Experimental results show that our metric achieves significant gains in speech quality (evaluated using an objective metric and a listening test) when compared to using MSE, or other perceptual-based loss functions from the literature.
引用
收藏
页码:1680 / 1684
页数:5
相关论文
共 31 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], PERCEPTUAL METRIC SP
[3]  
[Anonymous], 1996, Recommendation ITU-T P.800
[4]  
Chai L., 2017, 2017 IEEE 27 INT WOR, P1
[5]   End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks [J].
Fu, Szu-Wei ;
Wang, Tao-Wei ;
Tsao, Yu ;
Lu, Xugang ;
Kawai, Hisashi .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) :1570-1584
[6]   SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement [J].
Fu, Szu-Wei ;
Tsao, Yu ;
Lu, Xugang .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3768-3772
[7]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[8]  
Han W, 2016, IEEE INT CON MULTI
[9]  
Han W, 2016, PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), P446, DOI 10.1109/WCICA.2016.7578300
[10]  
Hirsch H.-G., 2000, 6 INT C SPOKEN LANGU, P181