Weighted Von Mises Distribution-based Loss Function for Real-time STFT Phase Reconstruction Using DNN

被引：0

作者：

Thien, Nguyen Binh ^{[1
]}

Wakabayashi, Yukoh ^{[2
]}

Geng Yuting ^{[1
]}

Iwai, Kenta ^{[1
]}

Nishiura, Takanobu ^{[1
]}

机构：

[1] Ritsumeikan Univ, Shiga, Japan

[2] Toyohashi Univ Technol, Toyohashi, Aichi, Japan

来源：

INTERSPEECH 2023 | 2023年

关键词：

Deep neural network; phase reconstruction; instantaneous frequency; group delay; von Mises distribution; CHANNEL SPEECH ENHANCEMENT; SIGNAL ESTIMATION; NETWORKS;

D O I：

10.21437/Interspeech.2023-580

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper presents improvements to real-time phase reconstruction using deep neural networks (DNNs). The advantage of DNN-based approaches in phase reconstruction is that they can leverage prior knowledge from data and are adaptable to real-time applications by using causal models. However, conventional DNN-based methods do not consider the varying properties of the phase at different time-frequency bins. Our paper proposes loss functions for phase reconstruction that incorporate frequency-specific and amplitude weights to distinguish the importance of phase elements based on their properties. We also use an extension of the group delay to improve the phase connections along the frequency. To improve the generalization, we augment the data by randomly shifting the signals in the time domain for each epoch during training. Experimental results show the superior performance of the proposed methods compared to conventional DNN-based and non-DNN real-time phase reconstruction methods.

引用

页码：3864 / 3868

页数：5

共 30 条

[1]

Beauregard GT, 2015, 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), P427, DOI 10.1109/ICDSP.2015.7251907

[2] ESTIMATING AND INTERPRETING THE INSTANTANEOUS FREQUENCY OF A SIGNAL .1. FUNDAMENTALS [J].