Sparse linear regression with structured priors and application to denoising of musical audio

被引:45
作者
Fevotte, Cedric [1 ]
Torresani, Bruno [2 ,3 ]
Daudet, Laurent [4 ]
Godsill, Simon J. [5 ]
机构
[1] CNRS GET Telecom Paris ENST, Dept Signal Images, F-75014 Paris, France
[2] Univ Aix Marseille 1, Lab Anal Topol & Probabil, Dept Phys, F-3453 Marseille 13, France
[3] Univ Aix Marseille 1, Lab Anal Topol & Probabil, Dept Math, F-3453 Marseille 13, France
[4] Univ Paris 06, Inst Jean Le Rond Dalembert, Lab Acoust Musicale, Lutheries Acoust Mus, F-75015 Paris, France
[5] Univ Cambridge, Dept Engn, Signal Proc Grp, Cambridge CB2 1PZ, England
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 01期
关键词
Bayesian variable selection; denoising; Markov chain Monte Carlo (MCMC) methods; nonlinear signal approximation; sparse component analysis; sparse regression; sparse representations;
D O I
10.1109/TASL.2007.909290
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We describe in this paper an audio denoising technique based on sparse linear regression with structured priors. The noisy signal is decomposed as a linear combination of atoms belonging to two modified discrete cosine transform (MDCT) bases, plus a residual part containing the noise. One MDCT basis has a long time resolution, and thus high frequency resolution, and is aimed at modeling tonal parts of the signal, while the other MDCT basis has short time resolution and is aimed at modeling transient parts (such as attacks of notes). The problem is formulated within a Bayesian setting. Conditional upon an indicator variable which is either 0 or 1, one expansion coefficient is set to zero or given a hierarchical prior. Structured priors are employed for the indicator variables; using two types of Markov chains, persistency along the time axis is favored for expansion coefficients of the tonal layer, while persistency along the frequency axis is favored for the expansion coefficients of the transient layer. Inference about the denoised signal and model parameters is performed using a Gibbs sampler, a standard Markov chain Monte Carlo (MCMC) sampling technique. We present results for denoising of a short glockenspiel excerpt and a long polyphonic music excerpt. Our approach is compared with unstructured sparse regression and with structured sparse regression in a single resolution MDCT basis (no transient layer). The results show that better denoising is obtained, both from signal-to-noise ratio measurements and from subjective criteria, when both a transient and tonal layer are used, in conjunction with our proposed structured prior framework.
引用
收藏
页码:174 / 185
页数:12
相关论文
共 26 条
[1]  
[Anonymous], 1997, 138187 ISOIEC
[2]  
Chen Y, 1998, NONCON OPTIM ITS APP, V20, P1
[3]  
CLYDE MA, 2007, BAYESIAN STAT, V8, P1
[4]   Hybrid representations for audiophonic signal encoding [J].
Daudet, L ;
Torrésani, B .
SIGNAL PROCESSING, 2002, 82 (11) :1595-1617
[5]   Sparse audio representations using the MCLT [J].
Davies, ME ;
Daudet, L .
SIGNAL PROCESSING, 2006, 86 (03) :457-470
[6]  
DAVY M, 2002, 7 VAL INT M BAY STAT, V7
[7]   On sequential Monte Carlo sampling methods for Bayesian filtering [J].
Doucet, A ;
Godsill, S ;
Andrieu, C .
STATISTICS AND COMPUTING, 2000, 10 (03) :197-208
[8]  
Edler B, 2000, 2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, P21, DOI 10.1109/ICOSP.2000.894435
[9]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121
[10]  
Fevotte C., 2006, P INT C AC SPEECH SI, P57