Weighted Von Mises Distribution-based Loss Function for Real-time STFT Phase Reconstruction Using DNN

被引:0
作者
Thien, Nguyen Binh [1 ]
Wakabayashi, Yukoh [2 ]
Geng Yuting [1 ]
Iwai, Kenta [1 ]
Nishiura, Takanobu [1 ]
机构
[1] Ritsumeikan Univ, Shiga, Japan
[2] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
来源
INTERSPEECH 2023 | 2023年
关键词
Deep neural network; phase reconstruction; instantaneous frequency; group delay; von Mises distribution; CHANNEL SPEECH ENHANCEMENT; SIGNAL ESTIMATION; NETWORKS;
D O I
10.21437/Interspeech.2023-580
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents improvements to real-time phase reconstruction using deep neural networks (DNNs). The advantage of DNN-based approaches in phase reconstruction is that they can leverage prior knowledge from data and are adaptable to real-time applications by using causal models. However, conventional DNN-based methods do not consider the varying properties of the phase at different time-frequency bins. Our paper proposes loss functions for phase reconstruction that incorporate frequency-specific and amplitude weights to distinguish the importance of phase elements based on their properties. We also use an extension of the group delay to improve the phase connections along the frequency. To improve the generalization, we augment the data by randomly shifting the signals in the time domain for each epoch during training. Experimental results show the superior performance of the proposed methods compared to conventional DNN-based and non-DNN real-time phase reconstruction methods.
引用
收藏
页码:3864 / 3868
页数:5
相关论文
共 30 条
[1]  
Beauregard GT, 2015, 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), P427, DOI 10.1109/ICDSP.2015.7251907
[2]   ESTIMATING AND INTERPRETING THE INSTANTANEOUS FREQUENCY OF A SIGNAL .1. FUNDAMENTALS [J].
BOASHASH, B .
PROCEEDINGS OF THE IEEE, 1992, 80 (04) :520-538
[3]  
Garofolo J. S., 1993, TIMIT ACOUSTIC PHONE
[4]   Phase Processing for Single-Channel Speech Enhancement [J].
Gerkmann, Timo ;
Krawczyk-Becker, Martin ;
Le Roux, Jonathan .
IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) :55-66
[5]   SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243
[6]   Significance of the modified group delay feature in speech recognition [J].
Hegde, Rajesh M. ;
Murthy, Hema A. ;
Gadde, Venkata Ramana Rao .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01) :190-202
[7]  
Le Roux J., 2010, P DAFX, V10, P397
[8]   Phasebook and Friends: Leveraging Discrete Representations for Source Separation [J].
Le Roux, Jonathan ;
Wichern, Gordon ;
Watanabe, Shinji ;
Sarroff, Andy ;
Hershey, John R. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) :370-382
[9]   Model-Based STFT Phase Recovery for Audio Source Separation [J].
Magron, Paul ;
Badeau, Roland ;
David, Bertrand .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (06) :1091-1101
[10]  
Masuyama Y., 2020, IEEE J SEL TOP QUANT, V15, P37