SIGNAL RECONSTRUCTION FROM MEL-SPECTROGRAM BASED ON BI-LEVEL CONSISTENCY OF FULL-BAND MAGNITUDE AND PHASE

被引:0
作者
Masuyama, Yoshiki [1 ]
Ueno, Natsuki [1 ]
Ono, Nobutaka [1 ]
机构
[1] Tokyo Metropolitan Univ, Tokyo, Japan
来源
2023 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA | 2023年
关键词
Phase reconstruction; waveform synthesis; mel-spectrogram; bi-level consistency; proximal splitting methods; ALTERNATING LINEARIZED MINIMIZATION; ALGORITHM; NONCONVEX;
D O I
10.1109/WASPAA58266.2023.10248111
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and is applicable to various audio signals. In this paper, we jointly reconstruct the full-band magnitude and phase by considering the bi-level relationships among the time-domain signal, its STFT coefficients, and its mel-spectrogram. The proposed method is formulated as a rigorous optimization problem and estimates the full-band magnitude based on the criterion used in GLA. Our experiments demonstrate the effectiveness of the proposed method on speech, music, and environmental signals.
引用
收藏
页数:5
相关论文
共 30 条
[1]  
[Anonymous], 2007, ITU-T Recommendation P.805
[2]  
Arias-Castro E, 2017, J MACH LEARN RES, V18, P1
[3]   Proximal alternating linearized minimization for nonconvex and nonsmooth problems [J].
Bolte, Jerome ;
Sabach, Shoham ;
Teboulle, Marc .
MATHEMATICAL PROGRAMMING, 2014, 146 (1-2) :459-494
[4]   A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION [J].
BYRD, RH ;
LU, PH ;
NOCEDAL, J ;
ZHU, CY .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) :1190-1208
[5]   Phase Retrieval with One or Two Diffraction Patterns by Alternating Projections with the Null Initialization [J].
Chen, Pengwen ;
Fannjiang, Albert ;
Liu, Gi-Ren .
JOURNAL OF FOURIER ANALYSIS AND APPLICATIONS, 2018, 24 (03) :719-758
[6]  
Giorgi B. D., 2022, P INT SOC MUS INF RE, P233
[7]   SIGNAL ESTIMATION FROM MODIFIED SHORT-TIME FOURIER-TRANSFORM [J].
GRIFFIN, DW ;
LIM, JS .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (02) :236-243
[8]   NON-AUTOREGRESSIVE SEQUENCE-TO-SEQUENCE VOICE CONVERSION [J].
Hayashi, Tomoki ;
Huang, Wen-Chin ;
Kobayashi, Kazuhiro ;
Toda, Tomoki .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7068-7072
[9]   An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers [J].
Jensen, Jesper ;
Taal, Cees H. .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) :2009-2022
[10]   ISTFTNET: FAST AND LIGHTWEIGHT MEL-SPECTROGRAM VOCODER INCORPORATING INVERSE SHORT-TIME FOURIER TRANSFORM [J].
Kaneko, Takuhiro ;
Tanaka, Kou ;
Kameoka, Hirokazu ;
Seki, Shogo .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6207-6211