Making Machines Understand Us in Reverberant Rooms

被引:179
作者
Yoshioka, Takuya [1 ]
Sehr, Armin [2 ]
Delcroix, Marc [1 ,3 ]
Kinoshita, Keisuke [1 ]
Maas, Roland [4 ]
Nakatani, Tomohiro [5 ]
Kellermann, Walter [4 ,6 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
[2] Ericsson Eurolab, Nurnberg, Germany
[3] Pixela Corp, Osaka, Japan
[4] Univ Erlangen Nurnberg, Nurnberg, Germany
[5] NTT Corp, Basic Res Lab, Kyoto, Japan
[6] AT&T Bell Labs, Murray Hill, NJ USA
关键词
CONVOLUTIONAL DISTORTION; MODEL ADAPTATION; SPEECH; DEREVERBERATION; FEATURES; COMPENSATION; SUPPRESSION;
D O I
10.1109/MSP.2012.2205029
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech recognition technology has left the research laboratory and is increasingly coming into practical use, enabling a wide spectrum of innovative and exciting voice-driven applications that are radically changing our way of accessing digital services and information. Most of today's applications still require a microphone located near the talker. However, almost all of these applications would benefit from distant-talking speech capturing, where talkers are able to speak at some distance from the microphones without the encumbrance of handheld or body-worn equipment [1]. For example, applications such as meeting speech recognition, automatic annotation of consumer-generated videos, speech-to-speech translation in teleconferencing, and hands-free interfaces for controlling consumer-products, like interactive TV, will greatly benefit from distant-talking operation. Furthermore, for a number of unexplored but important applications, distant microphones are a prerequisite. This means that distant talking speech recognition technology is essential for extending the availability of speech recognizers as well as enhancing the convenience of existing speech recognition applications. © 2012 IEEE.
引用
收藏
页码:114 / 126
页数:13
相关论文
共 48 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
[Anonymous], 2000, INTERSPEECH, DOI DOI 10.1016/S0167-6393(03)00016-5
[3]  
[Anonymous], 2009, Distant Speech Recognition
[4]  
Buchner H, 2010, SIGNALS COMMUN TECHN, P311, DOI 10.1007/978-1-84996-056-4_10
[5]  
Delcroix M, 2011, ROBUST SPEECH RECOGNITION OF UNCERTAIN OR MISSING DATA: THEORY AND APPLICATIONS, P225, DOI 10.1007/978-3-642-21317-5_9
[6]   Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion [J].
Deng, L ;
Droppo, J ;
Acero, A .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :412-421
[7]  
Droppo J., 2008, Springer Handbook of Speech Processing, P653
[8]   Correlation-Based and Model-Based Blind Single-Channel Late-Reverberation Suppression in Noisy Time-Varying Acoustical Environments [J].
Erkelens, Jan S. ;
Heusdens, Richard .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1746-1765
[9]   Modulation Spectral Features for Robust Far-Field Speaker Identification [J].
Falk, Tiago H. ;
Chan, Wai-Yip .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (01) :90-100
[10]   SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM [J].
FURUI, S .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01) :52-59