Binary and ratio time-frequency masks for robust speech recognition

被引:184
作者
Srinivasan, Soundararajan
Roman, Nicoleta
Wang, DeLiang
机构
[1] Ohio State Univ, Dept Biomed Engn, Columbus, OH 43210 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[3] Ohio State Univ, Ctr Cognit Sci, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
ideal binary mask; ratio mask; robust speech recognition; missing-data recognizer; binaural processing; speech segregation;
D O I
10.1016/j.specom.2006.09.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A time-varying Wiener filter specifies the ratio of a target signal and a noisy mixture in a local time-frequency unit. We estimate this ratio using a binaural processor and derive a ratio time-frequency mask. This mask is used to extract the speech signal, which is then fed to a conventional speech recognizer operating in the cepstral domain. We compare the performance of this system with a missing-data recognizer that operates in the spectral domain using the time-frequency units that are dominated by speech. To apply the missing-data recognizer, the same binaural processor is used to estimate an ideal binary time-frequency mask, which selects a local time-frequency unit if the speech signal within the unit is stronger than the interference. We find that the performance of the missing data recognizer is better on a small vocabulary recognition task but the performance of the conventional recognizer is substantially better when the vocabulary size is increased. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1486 / 1501
页数:16
相关论文
共 43 条
[1]  
[Anonymous], 2000, HTK BOOK HTK VERSION
[2]  
[Anonymous], 1998, COMPUTATIONAL AUDITO
[3]  
[Anonymous], 2000, ICSLP 2000
[4]  
Blauert J., 1997, SPATIAL HEARING PSYC
[5]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[6]  
BRADSTEIN M, 2001, MICROPHONE ARRAYS SI
[7]  
Browns GJ, 2005, SIG COM TEC, P371, DOI 10.1007/3-540-27489-8_16
[8]   Blind signal separation: Statistical principles [J].
Cardoso, JF .
PROCEEDINGS OF THE IEEE, 1998, 86 (10) :2009-2025
[9]   Special issue: Emerging multiple access technologies [J].
Chen, HH ;
Li, DB ;
Bi, Q .
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2005, 5 (01) :1-4
[10]  
COLE R, 1995, P EUR C SPEECH COMM, V1, P821