Weighted Von Mises Distribution-based Loss Function for Real-time STFT Phase Reconstruction Using DNN

被引:0
作者
Thien, Nguyen Binh [1 ]
Wakabayashi, Yukoh [2 ]
Geng Yuting [1 ]
Iwai, Kenta [1 ]
Nishiura, Takanobu [1 ]
机构
[1] Ritsumeikan Univ, Shiga, Japan
[2] Toyohashi Univ Technol, Toyohashi, Aichi, Japan
来源
INTERSPEECH 2023 | 2023年
关键词
Deep neural network; phase reconstruction; instantaneous frequency; group delay; von Mises distribution; CHANNEL SPEECH ENHANCEMENT; SIGNAL ESTIMATION; NETWORKS;
D O I
10.21437/Interspeech.2023-580
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents improvements to real-time phase reconstruction using deep neural networks (DNNs). The advantage of DNN-based approaches in phase reconstruction is that they can leverage prior knowledge from data and are adaptable to real-time applications by using causal models. However, conventional DNN-based methods do not consider the varying properties of the phase at different time-frequency bins. Our paper proposes loss functions for phase reconstruction that incorporate frequency-specific and amplitude weights to distinguish the importance of phase elements based on their properties. We also use an extension of the group delay to improve the phase connections along the frequency. To improve the generalization, we augment the data by randomly shifting the signals in the time domain for each epoch during training. Experimental results show the superior performance of the proposed methods compared to conventional DNN-based and non-DNN real-time phase reconstruction methods.
引用
收藏
页码:3864 / 3868
页数:5
相关论文
共 30 条
[21]   PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation [J].
Takahashi, Naoya ;
Agrawal, Purvi ;
Goswami, Nabarun ;
Mitsufuji, Yuki .
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, :2713-2717
[22]  
Takamichi S, 2018, INT WORKSH ACOUSTIC, P286, DOI 10.1109/IWAENC.2018.8521313
[23]  
Tan K, 2020, IEEE-ACM T AUDIO SPE, V28, P380, DOI [10.1109/TASLP.2019.2955276, 10.1109/taslp.2019.2955276]
[24]   RECURRENT PHASE RECONSTRUCTION USING ESTIMATED PHASE DERIVATIVES FROM DEEP NEURAL NETWORKS [J].
Thieling, Lars ;
Wilhelm, Daniel ;
Jax, Peter .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7088-7092
[25]  
Thien NB, 2022, ASIAPAC SIGN INFO PR, P957, DOI 10.23919/APSIPAASC55919.2022.9980176
[26]  
Ulyanov D., 2016, CVPR
[27]  
van den Oord A, 2016, ADV NEUR IN, V29
[28]   Single-Channel Speech Enhancement With Phase Reconstruction Based on Phase Distortion Averaging [J].
Wakabayashi, Yukoh ;
Fukumori, Takahiro ;
Nakayama, Masato ;
Nishiura, Takanobu ;
Yamashita, Yoichi .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (09) :1559-1569
[29]  
Wakabayashi Y, 2017, INT CONF ACOUST SPEE, P5560, DOI 10.1109/ICASSP.2017.7953220
[30]   Real-time signal estimation from modified short-time Fourier transform magnitude spectra [J].
Zhu, Xinglei ;
Beauregard, Gerald T. ;
Wyse, Lonce L. .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05) :1645-1653