Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions

被引:198
作者
Loizou, Philipos C. [1 ]
Kim, Gibak [2 ]
机构
[1] Univ Texas Dallas, Dept Elect Engn, Richardson, TX 75083 USA
[2] Daegu Univ, Sch Elect Engn, Coll Informat & Commun, Taegu 712714, South Korea
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2011年 / 19卷 / 01期
关键词
Ideal binary mask; speech distortions; speech enhancement; speech intelligibility improvement; NOISE; RECOGNITION; INDEX;
D O I
10.1109/TASL.2010.2045180
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Existing speech enhancement algorithms can improve speech quality but not speech intelligibility, and the reasons for that are unclear. In the present paper, we present a theoretical framework that can be used to analyze potential factors that can influence the intelligibility of processed speech. More specifically, this framework focuses on the fine-grain analysis of the distortions introduced by speech enhancement algorithms. It is hypothesized that if these distortions are properly controlled, then large gains in intelligibility can be achieved. To test this hypothesis, intelligibility tests are conducted with human listeners in which we present processed speech with controlled speech distortions. The aim of these tests is to assess the perceptual effect of the various distortions that can be introduced by speech enhancement algorithms on speech intelligibility. Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others. When these distortions were properly controlled, however, large gains in intelligibility were obtained by human listeners, even by spectral-subtractive algorithms which are known to degrade speech quality and intelligibility.
引用
收藏
页码:47 / 56
页数:10
相关论文
共 45 条
[1]  
[Anonymous], 1988, Objective measures of speech quality
[2]  
[Anonymous], 2000, ITU-T rec, P862
[3]  
[Anonymous], 2007, Speech Enhancement: Theory and Practice
[4]  
[Anonymous], COMPUTATIONAL AUDITO
[5]  
ANSI (American National Standards Institute), 1997, S351997 ANSI
[6]  
Araki S., 2005, P IEEE INT C AC SPEE, VIII, P81
[7]   Digital noise reduction: Outcomes from laboratory and field studies [J].
Bentler, Ruth ;
Wu, Yu-Hsiang ;
Kettel, Jerrica ;
Hurtig, Richard .
INTERNATIONAL JOURNAL OF AUDIOLOGY, 2008, 47 (08) :447-460
[8]   COMPUTATIONAL AUDITORY SCENE ANALYSIS [J].
BROWN, GJ ;
COOKE, M .
COMPUTER SPEECH AND LANGUAGE, 1994, 8 (04) :297-336
[9]   Isolating the energetic com ponent of speech-on-speech masking with ideal time-frequency segregation [J].
Brungart, Douglas S. ;
Chang, Peter S. ;
Simpson, Brian D. ;
Wang, DeLiang .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (06) :4007-4018
[10]   Robust automatic speech recognition with missing and unreliable acoustic data [J].
Cooke, M ;
Green, P ;
Josifovski, L ;
Vizinho, A .
SPEECH COMMUNICATION, 2001, 34 (03) :267-285