Model-Based STFT Phase Recovery for Audio Source Separation

被引:35
作者
Magron, Paul [1 ,2 ]
Badeau, Roland [3 ]
David, Bertrand [3 ]
机构
[1] Telecom ParisTech, F-75013 Paris, France
[2] Tampere Univ Technol, Lab Signal Proc, FI-33720 Tampere, Finland
[3] Univ Paris Saclay, Telecom ParisTech, LTCI, F-75013 Paris, France
基金
匈牙利科学研究基金会;
关键词
Phase recovery; sinusoidal modeling; phase unwrapping; auxiliary function method; audio source separation; NONNEGATIVE MATRIX FACTORIZATION; INFORMED SOURCE SEPARATION; SIGNAL ESTIMATION; RECONSTRUCTION; FREQUENCY;
D O I
10.1109/TASLP.2018.2811540
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For audio source separation applications, it is common to estimate the magnitude of the short-time Fourier transform (STFT) of each source. In order to further synthesize time-domain signals, it is necessary to recover the phase of the corresponding complex-valued STFT. Most authors in this field choose a Wiener-like filtering approach, which boils down to use the phase of the original mixture. In this paper, a different standpoint is adopted. Many music events are partially composed of slowly varying sinusoids and the STFT phase increment over time of those frequency components takes a specific form. This allows phase recovery by an unwrapping technique once a short-term frequency estimate has been obtained. Herein, a novel iterative source separation procedure is proposed that builds upon these results. It consists in minimizing the mixing error by means of the auxiliary function method. This procedure is initialized by exploiting the unwrapping technique in order to generate estimates that benefit from a temporal continuity property. Experiments conducted on realistic music pieces show that, given accurate magnitude estimates, this procedure outperforms the state-of-the-art consistent Wiener filter.
引用
收藏
页码:1091 / 1101
页数:11
相关论文
共 57 条
[1]  
Abe M., 2004, STANM117 STANF U DEP
[2]  
Abe M., 2004, P AUD ENG SOC CONV M
[3]  
Beauregard GT, 2015, 2015 IEEE INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), P427, DOI 10.1109/ICDSP.2015.7251907
[4]   Estimation of frequency for AM/FM models using the phase vocoder framework [J].
Betser, Michael ;
Collen, Patrice ;
Richard, Gael ;
David, Bertrand .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2008, 56 (02) :505-517
[5]  
Bouboulis P., 2010, ARXIV10055170
[6]  
Bronson James, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P7475, DOI 10.1109/ICASSP.2014.6855053
[7]  
Comon P, 2010, HANDBOOK OF BLIND SOURCE SEPARATION: INDEPENDENT COMPONENT ANALYSIS AND APPLICATIONS, P1
[8]  
Emiya V., 2010, RES REPORT
[9]   Subjective and Objective Quality Assessment of Audio Source Separation [J].
Emiya, Valentin ;
Vincent, Emmanuel ;
Harlander, Niklas ;
Hohmann, Volker .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07) :2046-2057
[10]  
Févotte C, 2011, INT CONF ACOUST SPEE, P1980