Text-Informed Audio Source Separation. Example-Based Approach Using Non-Negative Matrix Partial Co-Factorization

被引:13
作者
Le Magoarou, Luc [1 ]
Ozerov, Alexey [2 ]
Duong, Ngoc Q. K. [2 ]
机构
[1] Inria Rennes Bretagne Atlantique, F-35042 Rennes, France
[2] Technicolor, F-35576 Cesson Sevigne, France
来源
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2015年 / 79卷 / 02期
关键词
Text-informed audio source separation; Non-negative matrix partial co-factorization; Excitation-filter model; Speech alignment; MODEL;
D O I
10.1007/s11265-014-0920-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The so-called informed audio source separation, where the separation process is guided by some auxiliary information, has recently attracted a lot of research interest since classical blind or non-informed approaches often do not lead to satisfactory performances in many practical applications. In this paper we present a novel text-informed framework in which a target speech source can be separated from the background in the mixture using the corresponding textual information. First, given the text, we propose to produce a speech example via either a speech synthesizer or a human. We then use this example to guide source separation and, for that purpose, we introduce a new variant of the non-negative matrix partial co-factorization (NMPCF) model based on a so-called excitation-filter-channel speech model. Such a modeling allows sharing the linguistic information between the speech example and the speech in the mixture. The corresponding multiplicative update (MU) rules are eventually derived for the parameters estimation and several extensions of the model are proposed and investigated. We perform extensive experiments to assess the effectiveness of the proposed approach in terms of source separation and alignment performance.
引用
收藏
页码:117 / 131
页数:15
相关论文
共 29 条
[1]  
[Anonymous], 1993, TECHNICAL REPORT
[2]  
Bryan N.J., 2013, INT C INT US INT IUI
[3]   Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model [J].
Duong, Ngoc Q. K. ;
Vincent, Emmanuel ;
Gribonval, Remi .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1830-1840
[4]  
Duong Q.K., 2014, P IEEE INT C AC SPEE
[5]  
Durrieu Jean-Louis, 2012, Latent Variable Analysis and Signal Separation. Proceedings 10th International Conference, LVA/ICA 2012, P438, DOI 10.1007/978-3-642-28551-6_54
[6]   Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals [J].
Durrieu, Jean-Louis ;
Richard, Gael ;
David, Bertrand ;
Fevotte, Cedric .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (03) :564-575
[7]  
Ellis D.P. W., 2003, Dynamic Time Warp (DTW) in Matlab
[8]   Subjective and Objective Quality Assessment of Audio Source Separation [J].
Emiya, Valentin ;
Vincent, Emmanuel ;
Harlander, Niklas ;
Hohmann, Volker .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07) :2046-2057
[9]   Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis [J].
Fevotte, Cedric ;
Bertin, Nancy ;
Durrieu, Jean-Louis .
NEURAL COMPUTATION, 2009, 21 (03) :793-830
[10]  
FitzGerald D., 2011, 22 IET IR SIGN SYST