On the optimality of ideal binary time-frequency masks

被引:107
作者
Li, Yipeng [1 ]
Wang, DeLiang [1 ,2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
关键词
Ideal binary mask; Ideal ratio mask; Optimality; Sound separation; Wiener filter; MONAURAL SPEECH; SEPARATION; ENHANCEMENT;
D O I
10.1016/j.specom.2008.09.001
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The concept of ideal binary time-frequency masks has received attention recently in monaural and binaural sound separation. Although often assumed, the optimality of ideal binary masks in terms of signal-to-noise ratio has not been rigorously addressed. In this paper we give a formal treatment on this issue and clarify the conditions for ideal binary masks to be optimal. We also experimentally compare the performance of ideal binary masks to that of ideal ratio masks on a speech mixture database and a music database. The results show that ideal binary masks are close in performance to ideal ratio masks which are closely related to the Wiener filter, the theoretically optimal linear filter. (c) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:230 / 239
页数:10
相关论文
共 31 条
  • [1] [Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
  • [2] [Anonymous], COMPUTATIONAL AUDITO
  • [3] Bregman A., 1990, Auditory Scene Analysis: The Perceptual Organization of Sound, DOI DOI 10.7551/MITPRESS/1486.001.0001
  • [4] COMPUTATIONAL AUDITORY SCENE ANALYSIS
    BROWN, GJ
    COOKE, M
    [J]. COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) : 297 - 336
  • [5] Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation
    Brungart, Douglas S.
    Chang, Peter S.
    Simpson, Brian D.
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) : 4007 - 4018
  • [6] Cooke M.P., 1993, Modeling Auditory Processing and Organization
  • [7] Speech enhancement using the modified phase-opponency model
    Deshmukh, Om D.
    Espy-Wilson, Carol Y.
    Carney, Laurel H.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (06) : 3886 - 3898
  • [8] GOTO M, 2003, INT C MUS INF RETR
  • [9] Mask estimation for missing data speech recognition based on statistics of binaural interaction
    Harding, S
    Barker, J
    Brown, GJ
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (01): : 58 - 67
  • [10] Hu G. N., 2001, IEEE WORKSH APPL SIG