Time-Frequency Sparsity by Removing Perceptually Irrelevant Components Using a Simple Model of Simultaneous Masking

被引:60
作者
Balazs, Peter [1 ]
Laback, Bernhard [1 ]
Eckel, Gerhard [2 ]
Deutsch, Werner A. [1 ]
机构
[1] Austrian Acad Sci, Acoust Res Inst, A-1040 Vienna, Austria
[2] Univ Mus & Dramat Arts, Inst Elect Mus & Acoust, A-8010 Graz, Austria
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 01期
关键词
Efficient algorithm; Gabor filter; Gabor transform; irrelevance filter; masking model; simultaneous masking; sparse representation; spectral masking; time-variant filter; NOISE; ADDITIVITY; INTEGRATION; AUDIBILITY; THRESHOLDS; PATTERNS;
D O I
10.1109/TASL.2009.2023164
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present an algorithm for removing time-frequency components, found by a standard Gabor transform, of a "real-world" sound while causing no audible difference to the original sound after resynthesis. Thus, this representation is made sparser. The selection of removable components is based on a simple model of simultaneous masking in the auditory system. Important goals were the applicability to any real-world music and speech sound, integrating mutual masking effects between time-frequency components, coping with the time-frequency spread of such an operation, and computational efficiency. The proposed algorithm first determines an estimation of the masked threshold within an analysis window. The masked threshold function is then shifted in level by an amount determined experimentally, and all components falling below this function (the irrelevance threshold) are removed. This shift gives a conservative way to deal with uncertainty effects resulting from removing time-frequency components and with inaccuracies in the masking model. The removal of components is described as an adaptive Gabor multiplier. Thirty-six normal hearing subjects participated in an experiment to determine the maximum shift value for which they could not discriminate the irrelevance filtered signal from the original signal. On average across the test stimuli, 32 percent of the time-frequency components fell below the irrelevance threshold.
引用
收藏
页码:34 / 49
页数:16
相关论文
共 67 条
[1]  
[Anonymous], 2016, Appl. Numer. Harmon. Anal
[2]  
[Anonymous], 1977, DISCRETE TIME SIGNAL
[3]  
BALAZS P, 2005, THESIS U VIENNA VIEN
[4]   Double preconditioning for Gabor frames [J].
Balazs, Peter ;
Feichtinger, Hans G. ;
Hampejs, Mario ;
Kracher, Guenther .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (12) :4597-4610
[5]   Basic definition and properties of Bessel multipliers [J].
Balazs, Peter .
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2007, 325 (01) :571-585
[6]   Frame-theoretic analysis of oversampled filter banks [J].
Bolcskei, H ;
Hlawatsch, F ;
Feichtinger, HG .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1998, 46 (12) :3256-3268
[7]  
Branner DavidPrager., 1999, TANG STUD, V17, P1
[8]   A quantitative model of the ''effective'' signal processing in the auditory system .2. Simulations and measurements [J].
Dau, T ;
Puschel, D ;
Kohlrausch, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (06) :3623-3631
[9]   A quantitative model of the ''effective'' signal processing in the auditory system .1. Model structure [J].
Dau, T ;
Puschel, D ;
Kohlrausch, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (06) :3615-3622
[10]   Modeling auditory processing of amplitude modulation .1. Detection and masking with narrow-band carriers [J].
Dau, T ;
Kollmeier, B ;
Kohlrausch, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1997, 102 (05) :2892-2905