SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech

被引：67

作者：

Ma, Jianfen ^{[1
,2
]}

Loizou, Philipos C. ^{[1
]}

机构：

[1] Univ Texas Dallas, Dept Elect Engn, Richardson, TX 75083 USA

[2] Taiyuan Univ Technol, Taiyuan 030024, Shanxi, Peoples R China

来源：

SPEECH COMMUNICATION | 2011年 / 53卷 / 03期

关键词：

Speech intelligibility; Speech enhancement; Speech intelligibility indices; RECEPTION THRESHOLD; SUBSPACE APPROACH; ENHANCEMENT; PARAMETERS; REDUCTION; COHERENCE; INDEX;

D O I：

10.1016/j.specom.2010.10.005

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Most of the existing intelligibility measures do not account for the distortions present in processed speech, such as those introduced by speech-enhancement algorithms. In the present study, we propose three new objective measures that can be used for prediction of intelligibility of processed (e.g., via an enhancement algorithm) speech in noisy conditions. All three measures use a critical-band spectral representation of the clean and noise-suppressed signals and are based on the measurement of the SNR loss incurred in each critical band after the corrupted signal goes through a speech enhancement algorithm. The proposed measures are flexible in that they can provide different weights to the two types of spectral distortions introduced by enhancement algorithms, namely spectral attenuation and spectral amplification distortions. The proposed measures were evaluated with intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech (consonants and sentences) corrupted by four different maskers (car, babble, train and street interferences). Highest correlation (r = -0.85) with sentence recognition scores was obtained using a variant of the SNR loss measure that only included vowel/consonant transitions and weak consonant information. High correlation was maintained for all noise types, with a maximum correlation (r = -0.88) achieved in street noise conditions. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：340 / 354

页数：15

共 44 条

[11]

COHEN I, 2008, HDB SPEECH PROCESSIN, P873

[12] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445

[13] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR [J].

EPHRAIM, Y ;

MALAH, D .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06) :1109-1121

[14] THE PERCEPTION OF SPEECH AND ITS RELATION TO TELEPHONY [J].

FLETCHER, H ;

GALT, RH .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1950, 22 (02) :89-151

[15] FACTORS GOVERNING THE INTELLIGIBILITY OF SPEECH SOUNDS [J].

FRENCH, NR ;

STEINBERG, JC .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1947, 19 (01) :90-119

[16] INTELLIGIBILITY-WEIGHTED MEASURES OF SPEECH-TO-INTERFERENCE RATIO AND SPEECH SYSTEM PERFORMANCE [J].

GREENBERG, JE ;

PETERSON, PM ;

ZUREK, PM .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1993, 94 (05) :3009-3010

[17] Spectral subtraction using reduced delay convolution and adaptive averaging [J].

Gustafsson, H ;

Nordholm, SE ;

Claesson, I .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08) :799-807

[18]

HIRSCH H, 2000, P ISCA ITRW ASR200

[19] A REVIEW OF THE MTF CONCEPT IN ROOM ACOUSTICS AND ITS USE FOR ESTIMATING SPEECH-INTELLIGIBILITY IN AUDITORIA [J].

HOUTGAST, T ;

STEENEKEN, HJM .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1985, 77 (03) :1069-1077

[20] Speech enhancement based on wavelet thresholding the multitaper spectrum [J].

Hu, Y ;

Loizou, PC .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (01) :59-67

← 1 2 3 4 5 →