Optimizing Speech Intelligibility in a Noisy Environment

被引:40
作者
Kleijn, W. Bastiaan [1 ,2 ,3 ,4 ,5 ,6 ]
Crespo, Joao B. [7 ,8 ]
Hendriks, Richard C. [9 ]
Petkov, Petko N. [4 ]
Sauert, Bastian [10 ,11 ]
Vary, Peter [6 ,12 ,13 ,14 ]
机构
[1] Victoria Univ Wellington, Wellington, New Zealand
[2] Delft Univ Technol, Delft, Netherlands
[3] Royal Inst Technol KTH, Sound & Image Proc Lab, Stockholm, Sweden
[4] Global IP Solut, San Francisco, CA USA
[5] AT&T Bell Labs, Div Res, Murray Hill, NJ USA
[6] IEEE, New York, NY USA
[7] Delft Univ Technol, Circuits & Syst Grp, NL-2600 AA Delft, Netherlands
[8] ExSilent BV, Amsterdam, Netherlands
[9] Delft Univ Technol, NL-2600 AA Delft, Netherlands
[10] HEAD Acoust, Herzogenrath, Germany
[11] Rhein Westfal TH Aachen, Inst Commun Syst & Data Proc, Aachen, Germany
[12] Philips Commun Ind, Digital Signal Proc Grp, Nurnberg, Germany
[13] Rhein Westfal TH Aachen, Aachen, Germany
[14] Inst Commun Syst & Data Proc, London, England
关键词
ENHANCEMENT; MODEL;
D O I
10.1109/MSP.2014.2365594
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Modern communication technology facilitates communication from anywhere to anywhere. As a result, low speech intelligibility has become a common problem, which is exacerbated by the lack of feedback to the talker about the rendering environment. In recent years, a range of algorithms has been developed to enhance the intelligibility of speech rendered in a noisy environment. We describe methods for intelligibility enhancement from a unified vantage point. Before one defines a measure of intelligibility, the level of abstraction of the representation must be selected. For example, intelligibility can be measured on the message, the sequence of words spoken, the sequence of sounds, or a sequence of states of the auditory system. Natural measures of intelligibility defined at the message level are mutual information and the hit-or-miss criterion. The direct evaluation of high-level measures requires quantitative knowledge of human cognitive processing. Lower-level measures can be derived from higher-level measures by making restrictive assumptions. We discuss the implementation and performance of some specific enhancement systems in detail, including speech intelligibility index (SII)-based systems and systems aimed at enhancing the sound-field where it is perceived by the listener. We conclude with a discussion of the current state of the field and open problems.
引用
收藏
页码:43 / 54
页数:12
相关论文
共 30 条
[1]   How Do Humans Process and Recognize Speech? [J].
Allen, Jont B. .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (04) :567-577
[2]  
[Anonymous], ACOUSTICS SIGNAL PRO
[3]  
[Anonymous], 2013, INTERSPEECH
[4]  
[Anonymous], S351997 ANSI
[5]  
[Anonymous], P INTERSPEECH
[6]  
[Anonymous], ITG FACHBERICHT SPRA
[7]   A glimpsing model of speech perception in noise [J].
Cooke, M .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03) :1562-1573
[8]  
Cooke M, 2013, INTERSPEECH, P3519
[9]   The listening talker: A review of human and algorithmic context-induced modifications of speech [J].
Cooke, Martin ;
King, Simon ;
Garnier, Maeva ;
Aubanel, Vincent .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02) :543-571
[10]   Evaluating the intelligibility benefit of speech modifications in known noise conditions [J].
Cooke, Martin ;
Mayo, Catherine ;
Valentini-Botinhao, Cassia ;
Stylianou, Yannis ;
Sauert, Bastian ;
Tang, Yan .
SPEECH COMMUNICATION, 2013, 55 (04) :572-585