SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech

被引：67

作者：

Ma, Jianfen ^{[1
,2
]}

Loizou, Philipos C. ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Elect Engn, Richardson, TX 75083 USA

[2] Taiyuan Univ Technol, Taiyuan 030024, Shanxi, Peoples R China

来源：

SPEECH COMMUNICATION | 2011年 / 53卷 / 03期

关键词：

Speech intelligibility; Speech enhancement; Speech intelligibility indices; RECEPTION THRESHOLD; SUBSPACE APPROACH; ENHANCEMENT; PARAMETERS; REDUCTION; COHERENCE; INDEX;

D O I：

10.1016/j.specom.2010.10.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Most of the existing intelligibility measures do not account for the distortions present in processed speech, such as those introduced by speech-enhancement algorithms. In the present study, we propose three new objective measures that can be used for prediction of intelligibility of processed (e.g., via an enhancement algorithm) speech in noisy conditions. All three measures use a critical-band spectral representation of the clean and noise-suppressed signals and are based on the measurement of the SNR loss incurred in each critical band after the corrupted signal goes through a speech enhancement algorithm. The proposed measures are flexible in that they can provide different weights to the two types of spectral distortions introduced by enhancement algorithms, namely spectral attenuation and spectral amplification distortions. The proposed measures were evaluated with intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech (consonants and sentences) corrupted by four different maskers (car, babble, train and street interferences). Highest correlation (r = -0.85) with sentence recognition scores was obtained using a variant of the SNR loss measure that only included vowel/consonant transitions and weak consonant information. High correlation was maintained for all noise types, with a maximum correlation (r = -0.88) achieved in street noise conditions. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：340 / 354

页数：15

共 44 条

[1] How Do Humans Process and Recognize Speech? [J].

Allen, Jont B. .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :567-577

[2]

[Anonymous], 1969, IEEE T ACOUST SPEECH, VAU17, P225

[3]

[Anonymous], 1988, Objective measures of speech quality

[4]

[Anonymous], P IEEE INT C AC SPEE

[5]

[Anonymous], 2007, Speech Enhancement: Theory and Practice

[6]

ANSI (American National Standards Institute), 1997, S351997 ANSI

[7]

Beerends J. G., 2004, P WORKSH MEAS SPEECH

[8] On the importance of the Pearson correlation coefficient in noise reduction [J].

Benesty, Jacob ;

Chen, Jingdong ;

Huang, Yiteng .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (04) :757-765

[9]

Benesty J, 2009, SPRINGER TOP SIGN PR, V2, P1, DOI 10.1007/978-3-642-00296-0_1

[10] New insights into the noise reduction Wiener filter [J].

Chen, Jingdong ;

Benesty, Jacob ;

Huang, Yiteng ;

Doclo, Simon .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1218-1234

← 1 2 3 4 5 →