Optimizing Speech Intelligibility in a Noisy Environment

被引:40
作者
Kleijn, W. Bastiaan [1 ,2 ,3 ,4 ,5 ,6 ]
Crespo, Joao B. [7 ,8 ]
Hendriks, Richard C. [9 ]
Petkov, Petko N. [4 ]
Sauert, Bastian [10 ,11 ]
Vary, Peter [6 ,12 ,13 ,14 ]
机构
[1] Victoria Univ Wellington, Wellington, New Zealand
[2] Delft Univ Technol, Delft, Netherlands
[3] Royal Inst Technol KTH, Sound & Image Proc Lab, Stockholm, Sweden
[4] Global IP Solut, San Francisco, CA USA
[5] AT&T Bell Labs, Div Res, Murray Hill, NJ USA
[6] IEEE, New York, NY USA
[7] Delft Univ Technol, Circuits & Syst Grp, NL-2600 AA Delft, Netherlands
[8] ExSilent BV, Amsterdam, Netherlands
[9] Delft Univ Technol, NL-2600 AA Delft, Netherlands
[10] HEAD Acoust, Herzogenrath, Germany
[11] Rhein Westfal TH Aachen, Inst Commun Syst & Data Proc, Aachen, Germany
[12] Philips Commun Ind, Digital Signal Proc Grp, Nurnberg, Germany
[13] Rhein Westfal TH Aachen, Aachen, Germany
[14] Inst Commun Syst & Data Proc, London, England
关键词
ENHANCEMENT; MODEL;
D O I
10.1109/MSP.2014.2365594
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Modern communication technology facilitates communication from anywhere to anywhere. As a result, low speech intelligibility has become a common problem, which is exacerbated by the lack of feedback to the talker about the rendering environment. In recent years, a range of algorithms has been developed to enhance the intelligibility of speech rendered in a noisy environment. We describe methods for intelligibility enhancement from a unified vantage point. Before one defines a measure of intelligibility, the level of abstraction of the representation must be selected. For example, intelligibility can be measured on the message, the sequence of words spoken, the sequence of sounds, or a sequence of states of the auditory system. Natural measures of intelligibility defined at the message level are mutual information and the hit-or-miss criterion. The direct evaluation of high-level measures requires quantitative knowledge of human cognitive processing. Lower-level measures can be derived from higher-level measures by making restrictive assumptions. We discuss the implementation and performance of some specific enhancement systems in detail, including speech intelligibility index (SII)-based systems and systems aimed at enhancing the sound-field where it is perceived by the listener. We conclude with a discussion of the current state of the field and open problems.
引用
收藏
页码:43 / 54
页数:12
相关论文
共 30 条
[21]   A speech intelligibility index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners [J].
Rhebergen, KS ;
Versfeld, NJ .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2005, 117 (04) :2181-2192
[22]  
Sauert Bastian, 2009, 2009 17th European Signal Processing Conference (EUSIPCO 2009), P1844
[23]  
Schepker H.F., 2013, P ANN C INT SPEECH C, P3577
[24]   A FREQUENCY IMPORTANCE FUNCTION FOR CONTINUOUS DISCOURSE [J].
STUDEBAKER, GA ;
PAVLOVIC, CV ;
SHERBECOE, RL .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1987, 81 (04) :1130-1138
[25]   Speech energy redistribution for intelligibility improvement in noise based on a perceptual distortion measure [J].
Taal, Cees H. ;
Hendriks, Richard C. ;
Heusdens, Richard .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (04) :858-872
[26]   On Optimal Linear Filtering of Speech for Near-End Listening Enhancement [J].
Taal, Cees H. ;
Jensen, Jesper ;
Leijon, Arne .
IEEE SIGNAL PROCESSING LETTERS, 2013, 20 (03) :225-228
[27]   A Low-Complexity Spectro-Temporal Distortion Measure for Audio Processing Applications [J].
Taal, Cees H. ;
Hendriks, Richard C. ;
Heusdens, Richard .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05) :1553-1564
[28]  
Tang Y., 2012, P INTERSPEECH, P955
[29]  
Valentini-Botinhao C., 2011, P INT, P1837
[30]   Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion [J].
Valentini-Botinhao, Cassia ;
Yamagishi, Junichi ;
King, Simon ;
Maia, Ranniery .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02) :665-686