On the optimality of ideal binary time-frequency masks

被引：108

作者：

Li, Yipeng ^{[1
]}

Wang, DeLiang ^{[1
,2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA

来源：

SPEECH COMMUNICATION | 2009年 / 51卷 / 03期

关键词：

Ideal binary mask; Ideal ratio mask; Optimality; Sound separation; Wiener filter; MONAURAL SPEECH; SEPARATION; ENHANCEMENT;

D O I：

10.1016/j.specom.2008.09.001

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The concept of ideal binary time-frequency masks has received attention recently in monaural and binaural sound separation. Although often assumed, the optimality of ideal binary masks in terms of signal-to-noise ratio has not been rigorously addressed. In this paper we give a formal treatment on this issue and clarify the conditions for ideal binary masks to be optimal. We also experimentally compare the performance of ideal binary masks to that of ideal ratio masks on a speech mixture database and a music database. The results show that ideal binary masks are close in performance to ideal ratio masks which are closely related to the Wiener filter, the theoretically optimal linear filter. (c) 2008 Elsevier B.V. All rights reserved.

引用

页码：230 / 239

页数：10

共 31 条

[1]

[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications

[2]

[Anonymous], COMPUTATIONAL AUDITO

[3]

Bregman A., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI DOI 10.7551/MITPRESS/1486.001.0001

[4] COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].

BROWN, GJ ;

COOKE, M .

COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336

[5] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].

Brungart, Douglas S. ;

Chang, Peter S. ;

Simpson, Brian D. ;

Wang, DeLiang .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018

[6]

Cooke M.P., 1993, Modeling Auditory Processing and Organization

[7] Speech enhancement using the modified phase-opponency model [J].

Deshmukh, Om D. ;

Espy-Wilson, Carol Y. ;

Carney, Laurel H. .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (06) :3886-3898

[8]

GOTO M, 2003, INT C MUS INF RETR

[9] Mask estimation for missing data speech recognition based on statistics of binaural interaction [J].

Harding, S ;

Barker, J ;

Brown, GJ .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01) :58-67

[10]

Hu G. N., 2001, IEEE WORKSH APPL SIG

← 1 2 3 4 →