Inter-Frequency Phase Difference for Phase Reconstruction Using Deep Neural Networks and Maximum Likelihood

被引:3
作者
Thien, Nguyen Binh [1 ]
Wakabayashi, Yukoh [2 ]
Iwai, Kenta [3 ]
Nishiura, Takanobu [3 ]
机构
[1] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, Kusatsu, Shiga 5258577, Japan
[2] Toyohashi Univ Technol, Dept Comp Sci & Engn, Toyohashi 4418580, Japan
[3] Ritsumeikan Univ, Coll Informat Sci & Engn, Kusatsu, Shiga 5258577, Japan
关键词
Spectrogram; Reconstruction algorithms; Wrapping; Delays; Time-frequency analysis; Sensitivity; Neural networks; Two-stage phase estimation; instantaneous frequency; group delay; weighted least squares; von Mises distribution; CHANNEL SPEECH ENHANCEMENT; SIGNAL; MODEL;
D O I
10.1109/TASLP.2023.3268577
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper presents improvements to two-stage algorithms for estimating the short-time Fourier transform (STFT) phase from only the amplitude by using deep neural networks (DNNs). The phase is difficult to reconstruct due to its sensitivity to the waveform shift and wrapping issue. To mitigate these problems, two-stage approaches indirectly estimate the phase through phase derivatives, i.e., instantaneous frequency (IF) and group delay (GD). In the first stage, the IF and GD are estimated from the amplitude using DNNs, and then in the second stage, the phase is reconstructed by maintaining the IF/GD information. Conventional methods for the second stage do not consider the importance of high-amplitude time-frequency bins, e.g., the least squares-based method, or lack a solid model, e.g., the average-based method. To address these problems, we propose improvements to the second stage of two-stage algorithms by using von Mises distribution-based maximum likelihood and weighted least squares. We also provide theoretical discussions for the phase reconstruction, including the investigations of the properties of the GD and roles of the IF/GD information in the inverse STFT. On the basis of the analysis, we propose a new phase-based feature, i.e., inter-frequency phase difference (IFPD), and demonstrate its application in two-stage phase reconstruction algorithms. We conducted subjective and objective experiments to compare the performances of our proposed and conventional methods. The results confirm that the proposed method using the IFPD performs better than other methods for all metrics.
引用
收藏
页码:1667 / 1680
页数:14
相关论文
共 55 条
  • [1] Analysis of derivative of instantaneous frequency and its application to voice activity detection
    Binh Thien Nguyen
    Wakabayashi, Yukoh
    Iwai, Kenta
    Nishiura, Takanobu
    [J]. APPLIED ACOUSTICS, 2021, 181
  • [2] ESTIMATING AND INTERPRETING THE INSTANTANEOUS FREQUENCY OF A SIGNAL .1. FUNDAMENTALS
    BOASHASH, B
    [J]. PROCEEDINGS OF THE IEEE, 1992, 80 (04) : 520 - 538
  • [3] Datta B.N., 2010, NUMERICAL LINEAR ALG, V116
  • [4] A uniform phase representation for the harmonic model in speech synthesis applications
    Degottex, Gilles
    Erro, Daniel
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014, : 1 - 16
  • [5] Dongarra J., 1999, LAPACK User's Guide, V3rd
  • [6] Garofolo J.S., 1993, Timit acoustic phonetic continuous speech corpus
  • [7] Gerkmann Timo, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P4478, DOI 10.1109/ICASSP.2014.6854449
  • [8] Phase Processing for Single-Channel Speech Enhancement
    Gerkmann, Timo
    Krawczyk-Becker, Martin
    Le Roux, Jonathan
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) : 55 - 66
  • [9] Bayesian Estimation of Clean Speech Spectral Coefficients Given a Priori Knowledge of the Phase
    Gerkmann, Timo
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (16) : 4199 - 4208
  • [10] Ghiglia D. C., 1998, Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software